(9.65 MB)

Molecular clock analysis of genomic data.

Download (9.65 MB)
posted on 24.11.2019, 11:45 by Torda Varga, Krisztina Krizsán, Csenge Földi, Bálint Dima, Marisol Sánchez-García, Santiago Sánchez-Ramírez, Gergely J. Szöllősi, János G. Szarkándi, Viktor Papp, László Albert, William Andreopoulos, Claudio Angelini, Vladimír Antonín, Kerrie W. Barry, Neale L. Bougher, Peter Buchanan, Bart Buyck, Viktória Bense, Pam Catcheside, Mansi Chovatia, Jerry Cooper, Wolfgang Dämon, Dennis Desjardin, Péter Finy, József Geml, Sajeet Haridas, Karen Hughes, Alfredo F. Justo, Dariusz Karasiński, Ivona Kautmanova, Brigitta Kiss, Sándor Kocsubé, Heikki Kotiranta, Kurt M. LaButti, Bernardo E. Lechner, Kare Liimatainen, Anna Lipzen, Zoltán Lukács, Sirma Mihaltcheva, Louis Morgado, Tuula Niskanen, Machiel E. Noordeloos, Robin A. Ohm, Beatriz Ortiz-Santana, Clark Ovrebo, Nikolett Rácz, Robert Riley, Anton Savchenko, Anton Shiryaev, Karl Soop, Viacheslav Spirin, Csilla Szebenyi, Michal Tomšovský, Rodham E. Tulloss, Jessie Uehling, Igor V. Grigoriev, Csaba Vágvölgyi, Tamás Papp, Francis M. Martin, Otto Miettinen, David S. Hibbett, László G. Nagy
Phylogenomic dataset We used a smaller, more conserved subset of the 568-gene and 104-species phylogenomic dataset (which was computationally not tractable in these analyses). First, we selected the first 70 most conserved genes of the 568-gene dataset by calculating the mean genetic distances for each gene using the dist.alignment function of the seqinR R package v.3.4-5. To enable a more accurate placement of fossil calibration points we added additional three species (Cyathus striatus, Pycnoporus cinnabarinus, Suillus brevipes) to this dataset (Supplementary Table 1) and excluded two taxa that harbored ambiguous positions. We searched homologous sequences in the additional genomes using blastp v.2.7.1 with one randomly selected gene from each of the 70 gene families as query. We selected the best hit (smallest E-value) as a 1-to-1 ortholog if the second best hit had a significantly worse E-value (by 20 orders of magnitude). Protein clusters were aligned by PRANK v.100802 using default settings. Next, conserved blocks of the alignments were selected using Gblocks V.0.91b with default settings except for the minimum length of a block which was set to 5 and gap positions in half of the sequences were allowed. A phylogenomic tree was constructed by RAxML v.8.2.11 under WAG+G substitution model partitioned by gene. Calibrations To dissect sources of differences in molecular age estimates, we ran analyses under 3 fossil calibration schemes (Supplementary Data 2) and the 105-species phylogenomic tree. First we used the same fossil calibration scheme as for the 5,284-species phylogenetic dataset (“Default calibration scheme”). Next, we replicated the analyses of Kohler et al. on our tree, using the fossil calibration points from Kohler et al. (“Kohler et al. calibration scheme 1”). For this, we placed the suilloid ectomycorrhiza fossil in the split of Suillinae/Paxillinae/Sclerodermatinae and Archaeomarasmius leggettii in the mrca of Gymnopus luxurians and Schizophyllum commune with uniformly distributed 40–60 mya and 70–110 mya time priors, respectively. Finally, we used the calibrations used by Kohler et al. (2) but placed the two fossils in the mrca-s of the Suillaceae and marasmioid clade, respectively (Kohler et al. calibration scheme 2”). In all analyses we constrained the age of the root to be between 300 mya and 600 mya. Penalized likelihood analysis in r8s We ran a series of molecular clock analyses in r8s v.1.81. A cross-validation analysis was performed to determine the optimal smoothing parameter (λ) by testing values across 7 orders of magnitude starting from 10-3. The additive penalty function was applied and the optimization was run 25 times starting from independent starting points. In one optimization step, after reaching an initial solution, the solution was perturbed and the truncated Newton (TN) optimization was rerun 20 times. We compared the results of previous studies to that of analyses across seven ancestral nodes in Agaricomycotina (Supplementary Data 2). Bayesian molecular clock dating We used the mcmctree method implemented in PAML version 4.8a. The independent-rates clock model, a WAG substitution model and approximate likelihood calculation were used. The birth rate, the death rate and the sampling fraction of the birth-death process were set to 1, 1 and 0.14 respectively. The shape and the concentration parameter of the gamma-Dirichlet prior for the drift rate coefficient (σ2) was set to 1 and three different scale parameters were tested (10, 100, 1,000) to see their effect on the time estimates. The substitution rates of each gene were estimated by codeml under a global clock model, to set the parameters of the gamma-Dirichlet prior for the overall rate. By calculating the mean substitution rate of the loci and examining the density plot of the rates we set up a prior which reasonably fitted the data: the shape parameter, the scale parameter and the concentration parameter were set to 5, 90.7441 and 1, respectively, resulting in an average substitution rate per site per time unit of 0.055. We set the time unit to 100 myr and applied uniform priors on 8 fossil calibrations with lower and upper hard bounds. MCMC (Markov chain Monte Carlo) analysis was run for 80,000 iterations, discarding the first 20,000 iterations as a burn-in and sampling every 30th tree from the posterior. After three independent analyses were run the convergence of log-likelihood values was visually inspected and the estimated ages were compared between replicates. The following files are included.Genome_MolClock_ML_input_alignment.fasta: Maximum Likelihood input alignment. Genome_MolClock_ML_input_alignment_partitions.txt: Maximum Likelihood input alignment partitions. Genome_MolClock_ML_output_tree.tree: Genome MolClock ML output tree. Genome_MolClock_mcmctree_input_alignment.phy: mcmctree input alignment. Genome_MolClock_mcmctree_input_tree.tree: mcmctree input tree. Genome_MolClock_mcmctree_Hessian_matrix.inBV: mcmctree input Hessian matrix. Genome_MolClock_mcmctree_output.out: Genome MolClock mcmctree output. r8s_default_calibration_scheme.tre: R8S output1. r8s_kohler_etal_calibration_scheme1.tre: Genome MolClock R8S output2. r8s_kohler_etal_calibration_scheme2.tre: R8S output3.








World wide


Mesozoic era, Cenozoic era