Wednesday, August 1, 2012

Astrocladistics: a network analysis


I noted in an earlier blog post that phylogenetic analysis is used outside of biology, notably to study language evolution and cultural evolution. What is perhaps less well known is that it has also been suggested as applicable in the physical sciences, specifically to the "evolution" of galaxies (Keel 2002; Fraix-Burnet et al. 2003), which is called "astrocladistics". As noted by Fraix-Burnet (2004): "Assuming branching evolution of galaxies as a 'descent with modification', the concepts and tools of phylogenetic systematics widely used in biology can be heuristically transposed to the case of galaxies."

That is, a galaxy is a collection of stars, gas and dust, and galaxies change through time as a result of changing proportions of these components with different characteristics. This can be seen as analogous to variational evolution in biology, where changing proportions of individuals through time lead to evolutionary change within species. Since galaxy diversity can be expected to organize itself in a hierarchy (Fraix-Burnet et al. 2006b), a hierarchical diagram such as a tree would be appropriate for displaying galaxy "morphology" and history.

I am not sure that this isn't perhaps taking the analogy a bit too far, in the sense that in biology, language and culture there is inheritance of derived characters states from generation to generation, whereas for galaxies the stars etc undergo continuous physical change. Therefore, the logic of phylogenetic analysis, which ensures that there is biological meaning to the mathematical summary produced by a phylogenetic analysis, does not directly apply in the case of galaxies. One can claim that any change through time is "evolution", but that does make it the same as "biological evolution".

Nevertheless, one can certainly apply a phylogenetic analysis to data for galaxies, as demonstrated by Fraix-Burnet et al. (2006a, 2006b, 2006c). All one has to do is break up the continuous astrophysical measurements (eg. electromagnetic spectra, such as broadband magnitudes) into discrete character states, which are treated as ordered in the analysis (ie. changes between two adjacent states cost less than change between distant states), and then feed the resulting matrix into a tree-building program.

Fraix-Burnet et al. (2006a) did this for some data simulated by GALICS (Galaxies In Cosmological Simulations), which they say "is a hybrid model for hierarchical galaxy formation studies, combining the outputs of large cosmological N-body simulations with simple, semi-analytic recipes to describe the fate of the baryons within dark matter haloes".  From the simulation they chose 10 galaxies (labelled A–J) at 5 different epochs (ie. steps in the simulation, corresponding to a redshift of 3, 1.9, 1.0, 0.4 and 0), for a total of 50 "taxa". They used 91 "characters", each broken into eight states "by regularly binning the corresponding range of values among all galaxies". These characters included mass, component radius, rotation speed, dynamical time, and star formation rates, but most of them referred to magnitudes for different broadband filters at different wavelengths. This matrix was subjected to a maximum-parsimony analysis (using PAUP*).

Sadly, the authors found it difficult to perform a credible phylogenetic analysis. In order to get a fully resolved tree (Analysis 1), the authors had to exclude 11 galaxies out of the 50. Even then, the interpretation of the cladograms was problematic, as the five epochs sampled did not show consistent patterns within the tree (they should follow the same time direction for each of the 10 galaxies).

The authors then noted: "At this stage of the analysis, two options are possible. The first one is to assume that, because galaxies ADEFI and BCGHJ are born with different burst components, they could have two different ancestors ... The second option is to remove the burst characters".

The authors tried both of these suggested analyses. In the first case (Analysis 2), they created two trees, one from each of the two taxon subsets, but they still had to exclude galaxy B2 to get resolved trees. In the second case (Analysis 3), the reduced set of 60 characters yielded a cladogram that required the exclusion of 20 galaxies in order to get a fully resolved tree. These 20 galaxies were then used to produce a second tree. So, in both cases they ended up with two trees, each based on different galaxies.

The authors' conclusion was:
"Among the different results presented in this paper, those shown in [Analysis 3] are clearly the most satisfactory because they are less affected by a priori subjective choice, and the evolutionary scenario represented on the cladograms is astrophysically plausible. On the contrary, the analysis using all characters [Analysis 1] is plagued by doubt on burst characters as galaxy evolution indicators. The other results [Analysis 2] heavily depend on our a priori knowledge of lineages available thanks to the simulations. They thus seem very artificial and cannot be representative of a real data set."

To me, this all seems overly complex. There are clearly multiple patterns in the data, and the first thing to do is find out what they look like. The issues raised here could easily be dealt with using a network as a tool for exploratory data analysis.

So, I took the dataset as presented in the paper, and performed a NeighborNet analysis. To do this I had to re-code the characters, because the SplitsTree program does not directly deal with ordered character states. So, each character becomes three characters, with the states coded as: 0 = 000, 1 = 001, 2 = 010,  3 = 011, 4 = 100, 5 = 101, 6 = 110, 7 = 111. The hamming distance then produces the correct distance for the ordered character states.

Click to enlarge.

The resulting network reveals the situation that the authors struggled to deal with. One of the largest splits creates two well-defined partitions of the galaxies, with galaxies A,C,D,E,F,I on the right and galaxies B,G,H,J on the left.

As noted by the authors, galaxy B2 is in the "wrong" partition (it is highlighted in the figure). However, we can also clearly see that galaxies B3,B4,B5 are themselves unusual compared to the other galaxies, as they have a very large split of their own. Galaxy B is thus highlighted as having a very unusual history, which is needs to be investigated separately from the other galaxies.

What is more important, the behaviour of galaxy C does not match the authors' a priori subdivision based on different burst components. The authors placed C with the BGHJ group whereas the network places C2,C3,C4,C5 with the ADEFI group (as highlighted in the figure). This explains why the authors found Analysis 2 unsatisfactory — their a priori subdivision of the taxa does not quite match the data.

The authors also considered Analysis 3 to be unsatisfactory because a large amount of the data were deleted, making "the total number of significantly discriminant characters somewhat too low to hope to obtain a very robust cladogram for the 50 galaxies." However, the network shows that this is not the real problem. The problem is that the data are not very tree-like, no matter which characters are considered. This suggests that these galaxies have not organized themselves into a hierarchy at all.

Indeed, to me the data make it clear that constructing a phylogeny for galaxy data is not a very useful exercise, at least if this is the sort of dataset that can be expected. Moreover, as the authors note: "The sample used in this paper is made of galaxies that are too simple as compared to the real world." In that case, as a proof of concept this analysis is not very convincing.

References

Fraix-Burnet, D. (2004) First phylogenetic analyses of galaxy evolution. In: Penetrating Bars Through Masks of Cosmic Dust: The Hubble Tuning Fork Strikes a New Note, eds D.L. Block, I. Puerari, K.C. Freeman, R. Groess, E.K. Bloch. Springer, pp. 301-305.

Fraix-Burnet D., Choler P., Douzery E.J.P. (2003) What can biologists say about galaxy evolution? Astrophysics and Space Science 284: 535-538.

Fraix-Burnet D., Choler P., Douzery J.P.E. (2006c) Towards a phylogenetic analysis of galaxy evolution: a case study with the dwarf galaxies of the local group. Astronomy and Astrophysics 455: 845-851.

Fraix-Burnet D., Choler P., Douzery J.P. E., Verhamme A. (2006a) Astrocladistics: a phylogenetic analysis of galaxy evolution I. Character evolutions and galaxy histories. Journal of Classification 23: 31-56.

Fraix-Burnet D., Douzery J.P. E., Choler P., Verhamme A. (2006b) Astrocladistics: a phylogenetic analysis of galaxy evolution II. Formation and diversification of galaxies. Journal of Classification 23: 57-78.

Keel W.C. (2002) The Road to Galaxy Formation. Springer-Praxis Books in Astrophysics and Astronomy.

No comments:

Post a Comment