Monday, April 27, 2015

A phylogenetic network of late-night US television shows

"Late night" broadcasting on United States network / cable TV starts at about 11:00 or 11:30 pm, and goes for a couple of hours. Many networks broadcast similar shows during this time, which directly compete against each other for the available audience (which is currently estimated to be slightly in excess of 10 million people per night at 11:30 pm). Many of these shows have been on for a long time. Most of them are recorded on several weekday nights in front of a live audience, and they are usually associated with only a very few presenters over time (almost always men!).

For example, since the early 1990s we have had:
NBC Tonight Show

NBC Late Night

CBS Late Show
CBS Late Late Show

ABC Kimmel Live
ABC Nightline

ComedyCentral Daily Show

ComedyCentral Colbert Report
TBS Conan





Jay Leno 1992-2009
Conan O'Brien 2009-2010
Jay Leno 2010-2014
Jimmy Fallon 2014-
David Letterman 1982-1993
Conan O'Brien 1993-2009
Jimmy Fallon 2009-2014
Seth Meyers 2014-
David Letterman 1993-2015
Tom Snyder 1995-1999
Craig Kilborn 1999-2004
Craig Ferguson 2005-2014
James Corden 2015-
Jimmy Kimmel 2003-
Ted Koppel 1980-2005
Three-anchor team 2005-
Craig Kilborn 1996-1998
Jon Stewart 1999-
Stephen Colbert 2005-2014
Conan O'Brien 2010-

Eventually, the presenters retire or move elsewhere, and the other presenters then move around among the shows. This has lead to the so-called "Late night wars", in which the NBC studio executives in charge repeatedly show that their personnel management skills are often lacking. For example, David Letterman was expected to replace Johnny Carson when he retired as the host of the NBC Tonight Show in 1992, but the job was given to Jay Leno, instead. So, Letterman moved to a directly competing show on CBS. When Leno subsequently moved to another show, Conan O'Brien took over. However, Leno then moved back again, and so O'Brien moved to a directly competing show on TBS. The media interest in these shenanigans exceeded their interest in the shows themselves.

Another substantial decision was that by ABC, at the end of 2012, to swap the timelsots of Nightline (which used to run 11:35-12:00) and Kimmel Live (which ran 12:00-13:00). This had a notable effect on the audience numbers, because Nightline was one of the top two shows in its original timeslot whereas Kimmel Live currently gets about 1 million viewers fewer per night in that same slot. On the other hand Nightline in its new timelsot gets about the same audience as Kimmel Live did when it occupied the slot. That seems to be a net loss of audience for ABC.

The Nielsen Media Research viewing data are available online at the TV by the Numbers site. They provide the weekly averages for each show in millions of viewers, based on what is known as "live plus same day" viewing (ie. the audience at the time of broadcast plus same-day viewing of video recordings). The data I have looked at run from early December 2011 to the end of December 2014 (161 weeks). Unfortunately, these data rely on NBC press releases (rather than direct access to Nielsen), so there are some missing data.

The comparison of these shows can be visualized using a phylogenetic network, as a tool for exploratory data analysis. To create the network, I first calculated the similarity of the nine shows using the manhattan distance; and a Neighbor-net analysis was then used to display the between-show similarities as a phylogenetic network. So, shows that are closely connected in the network are similar to each other based on their audience figures across the three years, and those that are further apart are progressively more different from each other.

The network shows a gradient of increasing audience size, from bottom-left to top-right. So, the Tonight Show consistently got a average nightly audience of c. 3.5 million people, while Conan had c. 0.8 million. The two CBS shows both consistently did somewhat worse than their NBC timeslot competitors.

The two ABC shows apparently did well, but this is confounded by the timeslot swap noted above. Nightline did well for the first year (before it was moved) but not for the second two years, while Kimmel Live did the opposite. This is what creates the big reticulation in the middle of the network, as all of the other shows had fairly consistent audiences throughout the three years.

However, there was a steady decrease in the total audience size across the three years, from c. 12 million per night (at 11:30 pm) at the end of 2011 to c. 10 million at the end of 2014. The only major exception to this was at the time when Jimmy Fallon took over from Jay Leno (early 2014). For several weeks the Tonight Show audience increased to >8 million per night, so that the total audience was c. 15.5 million (a 50% increase). This shows just how many people are available to be added to the late-night viewing, compared to how many watch regularly. So, why are they not watching in the other weeks? It seems that Late Night Television is not reaching its full potential.

Wednesday, April 22, 2015

Do we need more terms for homology?

Homology is a concept that is fundamental to biological studies, and yet it is difficult to define. Generally, characters are considered to be homologous among organisms if they have been inherited from a common ancestral character.

Homology is thus at the heart of phylogenetics, as it expresses the historical relationships among characters, whereas a phylogeny expresses the historical relationships among taxa (including individuals). Since the relationships among the taxa are based on pre-existing information about the relationships among the characters, homology must be established first. It is for this reason that multiple sequence alignments, for example, are so valuable.

However, homology is a relative concept; that is, it is context sensitive. It only applies locally, to any one level of the hierarchy of character generalization. The classic example of this idea is bird wings versus bat wings. These structures are homologous as forelimbs but not as wings – birds and bats independently modified their forelimbs into wings. So, homology exists at the more general level (forelimbs) but not at the less general level (wings). Forelimbs developed first in evolutionary history (the common ancestor of animals with four legs is ancient), and later these forelimbs were modified in different descendants, with some developing wings, some flippers, and some arms. Wings, flippers and arms are more recent, and are thus less general.

So, we can conceptualize characters as existing at many hierarchical levels of generality, depending on when they developed. We might have (going from specific to general) nucleotides, amino acids, protein domains, proteins, biosynthetic pathways, developmental origins, and anatomy, among many possible conceptual levels. Lower levels in the hierarchy "control" the upper levels, so that nucleotides code for amino acids, domains consist of strings of amino acids, proteins function as enzymes in biosynthesis, and development is controlled by biosynthetic pathways.

A nucleotide insertion and compensatory deletion results in two amino acid substitutions,
so that simultaneously aligning homologous nucleotides and homologous amino acids is no longer possible

The issue is that homology among characters can only be determined within any one hierarchical level. As noted by Fitch (2000): "Life would have been simple if phylogenetic homology necessarily implied structural homology or either of them had necessarily implied functional homology. However, they map onto each other imperfectly".

For example, homology of amino acids among a group of organisms does not necessarily imply that all of their coding nucleotides are homologous (see the figure above) — originally the nucleotides would also have been homologous, but insertions and deletions through time can break the original relationship between the amino acids and their coding nucleotides. So, one cannot always simultaneously align homologous amino acids and homologous nucleotides.

Similarly, homology of two anatomical features does not necessarily imply that their developmental sequences are homologous. This is an issue that the study of evo-devo has made increasingly obvious. That is, sometimes identity of morphological characters is not the result of identity of the sets of genes that control their development (Meyer 1999; Mindell and Meyer 2001; Wagner 2014) — non-homologous genes and gene networks can produce morphological structures that are usually considered to be homologs, and non-homologous structures can express homologous genes.

Developmental biologists therefore often prefer a process-oriented concept of homology, which they call 'biological homology', where homologous features are those sharing a set of developmental constraints (Wagner 1989). Indeed, the terms 'syngeny' (Butler and Saidel 2000) and 'homocracy' (Nielsen and Martinez 2003) have been coined to describe morphological features that are organized through the expression of homologous gene networks, irrespective of whether those features are evolutionarily homologous or convergent.

Reticulation and homology

This idea can be extended to other evolutionary scenarios. The one I am particularly interested in here is the consequence of reticulation. In the situations discussed above the character modifications (ancestral to derived) come from "within" the lineage (traditional ancestor-descendant gene inheritance), but the modifications can also come from "outside", by gene flow.

For example, Andam and Gogarten (2012) have noted that horizontal gene transfer (HGT) can in fact be used to provide information for the concept of a Tree of Life, because a transferred gene can also be regarded as a shared derived character. That is, HGT of a gene into an ancestor forms a synapomorphy for its descendants. This gene may subsequently diversify among those descendants, even following a simple tree-like pattern of descent.

This creates a terminological issue. If diversification occurs, then these genes are homologous in the traditional sense (they are modified descendants of a common ancestral character). However, how do they compare to genes in the descendants of species that did not receive the HGT, and to the genes from which the transfer occurred? In the first case they are not applicable (just as the concept of wings is not applicable to animals with flippers). In the second case our current concept of homology does not apply in any simple sense.

The hierarchical concept of homology is tied to a tree model of evolution. The hierarchical nature of characters results from the nested hierarchy of taxon relationships. If there is no nested hierarchy of taxon relationships then our current concepts of homology are inadequate. We need terms that describe possible reticulate relationships among the characters, not just hierarchical ones.

Thus, along with modifications to the concept of monophyly (see Monophyletic groups in networks ), networks imply that we need modifications to the concept of homology, as well.


It is worth noting that a similar issue applies in other fields that are based on a concept of evolutionary history. For example, in historical linguistics words are considered to descend from ancestral languages and diversify among multiple daughter languages. These words are considered to be cognate (cf. homologous). However, words are also borrowed from unrelated languages, and these are loan words (cf. HGT). Loan words may also diversify among the daughter languages, both in the original language and in the borrowing language.

For example, the Germanic word *rīks (ruler) was borrowed from Celtic *rīxs (king), and it has come down to modern times as German 'Reich', English 'rich' (West Germanic), Swedish 'rike' (North Germanic), and Gothic 'reiks' (East Germanic) (see Wikipedia). This diversification has followed Grimm's Law, a regular phonological change that defines the Germanic family — so, the subsequent development of the loan word allows reconstruction of the evolutionary history, and the descendants are cognate. But are they cognate to the words descended from *rīxs within Celtic?


Andam CP, Gogarten JP (2013) Biased gene transfer contributes to maintaining the Tree of Life. In: Lateral Gene Transfer in Evolution (U Gophna, ed.), pp 263-274. Springer: New York.

Butler AB, Saidel WM (2000) Defining sameness: historical, biological, and generative homology. Bioessays 22: 846-853.

Fitch WM (2000) Homology: a personal view on some of the problems. Trends in Genetics 16: 227-231.

Meyer A (1999) Homology and homoplasy: the retention of genetic programmes. In: Homology (GR Bock, G Cardew, eds), pp. 141-157. Wiley: Chichester.

Mindell DP, Meyer A (2001) Homology evolving. Trends in Ecology and Evolution 16: 434-440.

Nielsen C, Martinez P (2003) Patterns of gene expression: homology or homocracy? Development Genes and Evolution 213: 149-154.

Wagner GP (1989) The biological homology concept. Annual Review of Ecology and Systematics 20: 51-69.

Wagner GP (2014) Homology, Genes, and Evolutionary Innovation. Princeton University Press: Princeton NJ.