(19.96 MB)

Maximum Likelihood analysis of the 5284 taxa dataset

Download (19.96 MB)
posted on 24.11.2019 by Torda Varga, Krisztina Krizsán, Csenge Földi, Bálint Dima, Marisol Sánchez-García, Santiago Sánchez-Ramírez, Gergely J. Szöllősi, János G. Szarkándi, Viktor Papp, László Albert, William Andreopoulos, Claudio Angelini, Vladimír Antonín, Kerrie W. Barry, Neale L. Bougher, Peter Buchanan, Bart Buyck, Viktória Bense, Pam Catcheside, Mansi Chovatia, Jerry Cooper, Wolfgang Dämon, Dennis Desjardin, Péter Finy, József Geml, Sajeet Haridas, Karen Hughes, Alfredo F. Justo, Dariusz Karasiński, Ivona Kautmanova, Brigitta Kiss, Sándor Kocsubé, Heikki Kotiranta, Kurt M. LaButti, Bernardo E. Lechner, Kare Liimatainen, Anna Lipzen, Zoltán Lukács, Sirma Mihaltcheva, Louis Morgado, Tuula Niskanen, Machiel E. Noordeloos, Robin A. Ohm, Beatriz Ortiz-Santana, Clark Ovrebo, Nikolett Rácz, Robert Riley, Anton Savchenko, Anton Shiryaev, Karl Soop, Viacheslav Spirin, Csilla Szebenyi, Michal Tomšovský, Rodham E. Tulloss, Jessie Uehling, Igor V. Grigoriev, Csaba Vágvölgyi, Tamás Papp, Francis M. Martin, Otto Miettinen, David S. Hibbett, László G. Nagy
Multiple sequence alignment was carried out for the LSU, ef-1a and RPB2 loci separately using the Probabilistic Alignment Kit (PRANK release 140603). An iterative alignment refinement strategy as described in Tóth et al. was employed: ML gene trees computed from preliminary alignments (using RAxML, see below) were used as guide trees for the next round of multiple alignment for PRANK. After three rounds of iterative refinement, the alignments were further corrected manually using a text editor. Manual curation was restricted to correcting homologous regions erroneously juxtaposed by PRANK. Alignments of individual sequences were concatenated into a superalignment. Maximum likelihood trees for the 5,284-taxon dataset were inferred using the parallel version of RAxML 8.1.2 under the GTR model with gamma distributed rate heterogeneity (4 categories) with three partitions corresponding to the LSU, ef1-α and rpb2 loci. The phylogenomic tree was used as a backbone monophyly constraint. We performed 245 ML inferences and tested whether these trees adequately represented the plausible set of topologies given the alignment. This was done to ensure that phylogenetic uncertainty is properly taken into account in subsequent comparative analyses. If our tree set contains all plausible topologies, then the rolling average of pairwise Robinson-Foulds (RF) distances should show a saturation as a function of increasing the number of trees. To this end, we computed RF distance for each pair of trees for incrementally larger numbers of trees using R package “phangorn” v.2.0.2. We then plotted the rolling average, maximum and minimum values as a function of the number of trees in R. The following files are included: 5284taxa_ML_alignment.fas: input alignment. 5284taxa_ML_alignment_partitions.txt: a file containing the coordinates of the three partitions in the input multiple alignment. 5284taxa_ML_excluded_regions.txt: some of the regions of the alignment were excluded before proceeding to the ML analysis. This file describes the position of the excluded regions. 5284taxa_ML_alignment_exclude.txt: Alignment after excluding regions. 5284taxa_ML_alignment_exclude_partitions.txt: Parition file of the final alignment. 5284taxa_ML_constraint_backbone_tree.tre: A phylogenomic tree which was used to constrain the topology of the backbone of the 5284 taxa phylogeny. 5284taxa_ML_alignment.fas: 245 phylogenetic tree inferred by maximum likelihood analysis.








World wide


Mesozoic era, Cenozoic era