Wednesday, October 1, 2014

A fundamental limitation of pedigrees and networks but not trees

It would be nice to think that genealogical history can be reconstructed with ease. However, this is known not to be so. In particular, being able to reconstruct an overall history from a collection of sub-histories, which can thought of as the "building blocks", is not necessarily guaranteed.

That is, even given a complete collection of all of the sub-histories it is not necessarily possible to reconstruct a unique overall history. In other words, there can be pairs of graphs that do not represent the same evolutionary histories, but still display exactly the same collection of building blocks. ("Display" means roughly that a building block can be obtained by simply deleting some of the edges and vertices in the graph.) Mathematically, the sub-histories do not determine (or encode) the history.

For example, it is known that pedigrees cannot necessarily be reconstructed from a collection of all of the sub-pedigrees (Thatte 2008). Pedigrees are the traditional "family trees" showing the ancestry of individuals. Pedigrees differ from phylogenies in that all of the individuals have two parents (rather than possibly having a single immediate ancestor) and there are probably multiple roots (unless there is considerable inbreeding).

Phylogenetic trees, on the other hand can be uniquely reconstructed from a collection of all of the possible sub-trees (see Dress et al. 2012). This is one of the things that makes trees valuable as a phylogenetic model — it is theoretically possible to collect enough information to construct a unique phylogenetic tree.

Rooted phylogenetic networks do not, however, share this property. For some time it has been known that networks cannot necessarily be built from their building blocks, whether those blocks are rooted trees (Willson 2011) or triplets (= rooted 3-taxon trees) or clusters (= rooted sub-trees = clades) (Gambette and Huber 2012).

This is illustrated in the next figure (adapted from Huber et al.), which shows two networks at the top and below that the four trees that are displayed by both of them (by deleting one of each pair of incoming edges at the two reticulation nodes). Given these four trees we cannot reconstruct a unique network, and yet they are the only four trees associated with either network.

To make matters worse, Huber et al. (in press) have now revealed that we can't reconstruct rooted phylogenetic networks even from sub-networks. To do this they show that networks cannot necessarily be built from trinets (= rooted 3-taxon networks). Certain types of networks (e.g. level-1, level-2, tree-child) can be reconstructed (van Iersel and Moulton 2014), but Huber et al. show the example in the second figure, which shows two networks at the top and below that the four trinets that are displayed by both of them. Given these four trinets we cannot reconstruct a unique network, and yet they are the only four trinets associated with either network.

This means that "even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history." The best analogy here is Humpty Dumpty — even given all of the pieces, we literally might not be able to put him back together again. We could if he is a rooted tree, but we cannot guarantee it if he is a rooted network or pedigree.

This may not matter in practice, given that we don't yet know the circumstances under which it is possible to uniquely reconstruct networks, but it does mean that we acquire a certain degree of uncertainty as we move from "tree thinking" to "network thinking".


Dress A, Huber KT, Koolen J, Moulton V, Spillner A (2012) Basic Phylogenetic Combinatorics. Cambridge Uni Press.

Gambette P, Huber K (2012) On encodings of phylogenetic networks of bounded level. Journal of Mathematical Biology 65: 157-180.

Huber KT, van Iersel L, Moulton V, Wu T (in press) How much information is needed to infer reticulate evolutionary histories? Systematic Biology

van Iersel L, Moulton V (2014) Trinets encode tree-child and level-2 phylogenetic networks. Journal of Mathematical Biology 68: 1707-1729.

Thatte BD (2008) Combinatorics of pedigrees i: counterexamples to a reconstruction problem. SIAM Journal of Discrete Mathematics 22: 961-970.

Willson SJ (2011) Regular networks can be uniquely constructed from their trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8: 785-796.

Monday, September 29, 2014

Goofy genealogies

Family pedigrees seem to be confusing things, because there are two distinct interpretations of the expression "family tree".

First, the pedigree tree could be drawn with a particular contemporary person at the root of the tree, so that the tree expands backwards in time to increasing numbers of ancestors at the leaves. In some ways this seems quite illogical as an analogy, given that base of a real tree is the origin of its growth.

Second, the pedigree tree could be drawn with a particular ancestor at the root of the tree, so that the tree expands forwards in time to increasing numbers of descendants at the leaves. This is more logical, although we often draw the root at the top. (The following example is actually a network, rather than strictly a tree; see also Pedigrees and phylogenies are networks not trees.)

Pedigrees are generally somewhat different from phylogenies, but in phylogenetics we do choose the latter option for interpreting trees — we start with a collection of contemporary leaves and try to reconstruct the tree backwards towards the common ancestor. Thus the root is at the "base" of the tree, even when we draw the root at the top of the diagram.

In popular usage these distinctions are often blurred. Consider this "family tree" of the Disney character Goofy. It is taken from Gilles R. Maurice's Calisota web page, where the character names are listed clearly.

This is based on the first usage described above, since Goofy himself is at the base and his ancestors are at the leaves. This is actually closer to a lineage rather than a tree, especially as no females seem to be involved at any stage.

However, roughly the same information can be presented the other way around. This cartoon is taken from a different Calisota page.

Here, Goofy is now at the top of the tree and his ancestry proceeds downwards, with the oldest ancestor at the base (except for his son!). This really is confusing.