Input files and the results of the phylogenomic analysis
datasetposted on 24.11.2019, 11:46 by Torda Varga, Krisztina Krizsán, Csenge Földi, Bálint Dima, Marisol Sánchez-García, Santiago Sánchez-Ramírez, Gergely J. Szöllősi, János G. Szarkándi, Viktor Papp, László Albert, William Andreopoulos, Claudio Angelini, Vladimír Antonín, Kerrie W. Barry, Neale L. Bougher, Peter Buchanan, Bart Buyck, Viktória Bense, Pam Catcheside, Mansi Chovatia, Jerry Cooper, Wolfgang Dämon, Dennis Desjardin, Péter Finy, József Geml, Sajeet Haridas, Karen Hughes, Alfredo F. Justo, Dariusz Karasiński, Ivona Kautmanova, Brigitta Kiss, Sándor Kocsubé, Heikki Kotiranta, Kurt M. LaButti, Bernardo E. Lechner, Kare Liimatainen, Anna Lipzen, Zoltán Lukács, Sirma Mihaltcheva, Louis Morgado, Tuula Niskanen, Machiel E. Noordeloos, Robin A. Ohm, Beatriz Ortiz-Santana, Clark Ovrebo, Nikolett Rácz, Robert Riley, Anton Savchenko, Anton Shiryaev, Karl Soop, Viacheslav Spirin, Csilla Szebenyi, Michal Tomšovský, Rodham E. Tulloss, Jessie Uehling, Igor V. Grigoriev, Csaba Vágvölgyi, Tamás Papp, Francis M. Martin, Otto Miettinen, David S. Hibbett, László G. Nagy
Following the all-versus-all blast using mpiBLAST 1.6.0 with default parameters, we identified gene families using the Markov clustering algorithm MCL 14-137 with an inflation parameter of 2.0. We performed multiple sequence alignment using PRANK 140603. Ambiguously aligned regions were removed from the alignments using Gblocks 091b with the settings -p=yes -b2=26 -b3=10 -b4=5 -b5=h -t=p -e=.gbl. we screened for gene families that contains a single representative gene of each species, or ones that contained inparalogs but no deep paralogs. Deep paralogs were identified following Nagy et al. Gene trees were inferred using the PTHREADS version of RAxML 8.1.2 using the PROTGAMMAWAG model and the standard algorithm. A single inparalog, closest to the root based on root-to-tip patristic distances was retained for each species. Gene families in which >=75% of the species were represented were concatenated into a supermatrix. We used RAxML 8.1.2 to perform ML analysis and bootstrapping on the concatenated dataset, under a WAG model with gamma-distributed rate heterogeneity partitioned by input gene. We ran 100 bootstrap replicates using the rapid hill climbing algorithm. The robustness of the dataset was tested by eliminating incrementally higher numbers of fast-evolving sites using six levels of stringency in Gblocks 091b. Using these parameters, we eliminated 8.5% (-b1=78 -b2=78 -b3=10 -b4=10), 24.3% (-b1=78 -b2=78 -b3=8 -b4=15), 36.7% (-b1=88 -b2=88 -b3=8 -b4=15), 46.4% (-b1=95 -b2=95 -b3=8 -b4=15), 50.1% (-b1=100 -b2=100 -b3=8 -b4=15) and 55.2% (-b2=104 -b3=5 -b4=20) of the least reliably aligned regions of the alignments, resulting in trimmed concatenated datasets with 129.886, 107.496, 89.732, 76.153, 70.862 and 63.309 amino acid sites, respectively. We performed ML phylogenetic inference for each of the reduced datasets in RAxML as described above. The following files were included. RAxML_Genomic_Backbone_Alignment.phy: input alignment for the ML analysis. RAxML_Genomic_Backbone_Phylogeny.tre: ML tree with bootstrap values. MultipleAlignment_GenomeTree_sensitivity_gbl_*_bip_SupplFig2_A-F.phy: six alignment for the sensitivity test. RAxML_GenomeTree_sensitivity_gbl_*_bip_SupplFig2_A-F.tre: six phylogeny, the results of the sensitivity test. These trees were depicted in the Supplement Figure 2 of the article.
dc.coverage.temporalMesozoic era, Cenozoic era
Read the peer-reviewed publication
Varga T, Krizsán K, Földi C, Dima B, Sánchez-García M, Sánchez-Ramírez S, Szöllősi GJ, Szarkándi JG, Papp V, Albert L, Andreopoulos W, Angelini C, Antonín V, Barry KW, Bougher NL, Buchanan P, Buyck B, Bense V, Catcheside P, Chovatia M, Cooper J, Dämon W, Desjardin D, Finy P, Geml J, Haridas S, Hughes K, Justo A, Karasiński D, Kautmanova I, Kiss B, Kocsubé S, Kotiranta H, LaButti KM, Lechner BE, Liimatainen K, Lipzen A, Lukács Z, Mihaltcheva S, Morgado LN, Niskanen T, Noordeloos ME, Ohm RA, Ortiz