Wednesday, May 22, 2013

Are phylogenetic trees useful for domesticated organisms?


When looking at the population genetics literature I have noticed that many papers still present very traditional phylogenetic analyses, particularly in what can broadly be called agricultural studies. For instance, genetic distances might be calculated between the samples and a "tree of genetic relationships" presented based on UPGMA clustering.

The problem with this sort of approach to genotype data analysis is that it forces the data into an ultrametric tree, which has long been shown to be inappropriate as a model for evolutionary relationships. Furthermore, there is no indication of the robustness of this tree, nor even whether a tree model is appropriate in the first place.

As a specific example, we can look at the microsatellite data presented by Carimi et al. (2010) for various Sicilian grape cultivars. For grape varieties, where hybridization among cultivars has been the historical norm, an ultrametric tree seems singularly inappropriate.

Wine grapes have been grown on Sicily for more than 2,000 years, and at least 120 grape-vine cultivar names are known in the literature. The authors sampled 82 of the cultivars from the Institute of Plant Genetics (Palermo) germplasm collection, with 1-5 clones sampled per cultivar. They assessed six polymorphic microsatellite loci, producing diploid (co-dominant) data. Only 70 distinct genotypes were detected, which were then subjected to data analysis.

The authors used the "Simple Matching coefficient for co-dominant and multiallelic data" to estimate the genetic distances between samples. Unfortunately, this has been shown to have odd properties for diploid  microsatellite data (Kosman and Leanard 2005). Therefore, in my analysis I have used the simple metric of Kosman and Leonard (2005), instead, in which genotype distances are calculated as a proportion of the shared alleles at each locus (averaged across loci). This was calculated using the mmod R package (Winter 2012).

The authors then used the "UPGMA (Unweighted Pair-Group Method with Arithmetical Averages)" clustering method to produce their ultrametric tree from the distance data. This is the most commonly encountered agglomerative hierarchical clustering method to be found in the literature. Instead, I used a NeighborNet network to evaluate whether the data are tree-like, calculated using the SplitsTree program.

The resulting network is shown in the first graph. Cultivars that are closely connected in the network are similar to each other based on their microsatellite profiles, and those that are further apart are progressively more different from each other.


The network shows that there is very little hierarchical structure to the grape-vine microsatellite data. The data do not clearly distinguish "six main groups", as interpreted by the original authors based on their tree (which is shown below). [Note that one of the authors' groups (cluster E) is more heterogeneous than the others, and to be comparable should be divided into either two or three groups.]


Note that the network emphasizes two things: (1) there are no clear groupings of the grape cultivars, and (2) the data are rather "noisy", as microsatellite data often are (e.g. Leroy et al. 2009), with many incompatible signals.

As far as the phylogenetic history is concerned, there is no evidence of "several origins for Sicilian grape-vine germplasm", as interpreted by the authors. Instead, there seems to have been continuous mixing of the genotypes, probably including cultivars from elsewhere in Italy, and even further afield around the Mediterranean. This type of complex genetic history seems to be quite common in domesticated organisms, and a tree-based analysis is therefore unlikely to be appropriate for studying them; see, for example, Decker et al. (2009) for cows, Leroy et al. (2009) for horses, and Kijas et al. (2012) for sheep.

References

Carimi F, Mercati F, Abbate L, Sunseri F (2010) Microsatellite analyses for evaluation of genetic diversity among Sicilian grapevine cultivars. Genetic Resources and Crop Evolution 57: 703–719.

Decker J.E., Pires J.C., Conant G.C., McKay S.D., Heaton M.P., Chen K., Cooper A., Vilkki J., Seabury C.M., Caetano A.R., Johnson G.S., Brenneman R.A., Hanotte O., Eggert L.S., Wiener P., Kim J.-J., Kim K.S., Sonstegard T.S., Van Tassell C.P., Neibergs H.L., McEwan J.C., Brauning R., Coutinho L.L., Babar M.E., Wilson G.A., McClure M.C., Rolf M.M., Kim J., Schnabel R.D., Taylor J.F. (2009) Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics. Proceedings of the National Academy of Sciences of the U.S.A. 106: 18644-18649.

Kijas J.W., Lenstra J.A., Hayes B., Boitard S., Porto Neto L.R., San Cristobal M., Servin B., McCulloch R., Whan V., Gietzen K., Paiva S., Barendse W., Ciani E., Raadsma H., McEwan J., Dalrymple B., other members of the International Sheep Genomics Consortium (2012) Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biology 10: e1001258.

Kosman E, Leonard KJ (2005) Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploid species. Molecular Ecology 14: 415–424.

Leroy G., Callède L., Verrier E., Mériaux J.C., Ricard A., Danchin-Burge C., Rognon X. (2009) Genetic diversity of a large set of horse breeds raised in France assessed by microsatellite polymorphism. Genetics Selection Evolution 41: 5.

Winter DJ (2012) mmod: an R library for the calculation of population differentiation statistics. Molecular Ecology Resources 12: 1158–1160.

Monday, May 20, 2013

Destroying the Tree of Life?


In my previous blog post (Resistance to network thinking) I noted that a phylogenetic network is a generalization of a phylogenetic tree because "a network simplifies to a tree if there are no incompatible phylogenetic signals". Given this, to me it has often seemed somewhat odd that so many of the people who are interested in generalizing the Tree of Life into a Network of Life use metaphors suggesting that the tree first needs to be destroyed.

This approach was popularized by Ford Doolittle, who entitled his 2000 Scientific American [282(2): 90–95] article "Uprooting the Tree of Life", although this particular metaphor had previously been used by, for example, Elizabeth Pennisi [Science 284: 1305-1307].

This approach reached its apogee with the ridiculous cover of New Scientist in January 2009. The cover accompanied an article by Graham Lawton now mildly entitled: "Why Darwin was wrong about the Tree of Life" [201(2692): 34-39], although the editor (Roger Highfield) originally called it "Axing Darwin's tree".


As was noted at the time, this cover was "a misdirected and entirely inappropriate piece of sensationalism", which did no one any good (least of all the editor). A subsequent Letter to the Editor [by Dennett, Coyne, Dawkins and Myers] noted: "Nothing in the article showed that the concept of the Tree of Life is unsound; only that it is more complicated than was realised before the advent of molecular genetics."

So, it seems likely that the tree needs to be neither axed nor uprooted, nor "trashed" [Laura Franklin-Hall], nor even "politely buried" [Michael Rose]. In many cases all that is is needed is some osculations between the branches. Indeed, most of the scientific discussion is about how many osculations there are, and how we can best detect where they are, rather than about destroying the tree itself. A network is more general than a tree, rather than being a fundamentally different structure. Nevertheless, some people, such as Michael Syvanen, have been quoted as saying: "We've just annihilated the Tree of Life", when referring to their new network.