These exercises explore aspects of tree structure that we can use to make inferences about evolutionary patterns.
Some of these methods assume that we an ultrametric or clock-like tree, other don't, but all require trees to have branch lengths (which may rperesent either time or amounts of evolutionary change).
Comparison of nearly perfect clock-like evolution of human, chimp, and gorilla mtDNA genome sequences compared to the non-clocklike evolution of cranium morphology in the same species (from Bayesian molecular clock dating of species divergences in the genomics era).
A key challenge is separating rates and dates. There are sophisticated methods for doing this, as well as some simpler methods (which we will use). The diagram below sketches out the key features of Bayesian clock methods where we have prior estimates of ages (say, from the fossil record) and rates of evolution (from previous studies), and combine those with new data to arrive at an estimate for the age of a node in a tree.
We estimate the posterior distribution of divergence time (t) and rate (r) in a two-species case to illustrate Bayesian molecular clock dating. The data are an alignment of the 12S RNA gene sequences from humans and orang-utans, with 90 differences at 948 nucleotides sites. The joint prior (part a) is composed of two gamma densities (reflecting our prior information on the molecular rate and on the geological divergence time of human–orang-utan), and the likelihood (part b) is calculated under the Jukes–Cantor model. The posterior surface (part c) is the result of multiplying the prior and the likelihood. The data are informative about the molecular distance, d = tr, but not about t and r separately. The posterior is thus very sensitive to the prior. The blue line indicates the maximum likelihood estimate of t and r, and the molecular distance d, with t̂r̂ = d̂. (from Bayesian molecular clock dating of species divergences in the genomics era).
Given a molecular tree that we want to convert to a "time tree" (a dated tree) we typically need to do two things:
By taking time slices through a tree and plotting how many lineages are in the tree at that point we construct a lineages through time plot. The diagram below shows some of the different patterns we might see in such a plot. Note that to make these plots we need an ultrametric tree, that is, a tree where all the tips line up. Typically these trees are the result of a molecular clock analysis.
Phylogenies, lineage through time plots, and gamma values illustrating the three patterns of cladogenesis and accumulation of species numbers. (A) Even rates through time, the null hypothesis for patterns of diversification. (B) Early burst of cladogenesis and species accumulation. (C) Late burst of speciation or early extinction. (from Ecological Opportunity: Trigger of Adaptive Radiation).
One method for building a tree is to compute pairwise distances between sequences. Typically we then discard the distances and focus on the tree. But there is information in the distances, which we explore in this exercise.
Figure 2. Schematic of the Inferred Barcoding Gap The distribution of intraspecific variation is shown in red, and interspecific divergence in yellow. (A) Ideal world for barcoding, with discrete distributions and no overlap. (B) An alternative version of the world with significant overlap and no gap. (From doi:10.1371/journal.pbio.0030422).