The University of Arizona

Past McGinnies Scholars: Where are they now?

 


Michael J. Sanderson (Dec. 2011 feature)

Career since UA:
Michael Sanderson studied phylogenetic systematics of plants in EEB at the UA under Michael Donoghue, who was then on the faculty there. He then did a postdoc at Cornell with Jeff Doyle, learning molecular systematics techniques, funded by a Sloan Foundation Fellowship. His first faculty position was in the Department of Biology at the University of Nevada, Reno (1992-1995). In 2005 he joined the Evolution and Ecology Department at UC Davis, where he remained until 2006. A few years back an opportunity arose to move back to UA, involving positions for Dr. Sanderson in EEB and his spouse, Shelley McMahon, in Plant Sciences. They were happy to return to the university where they both received degrees, and to the southwest, the flora of which served as considerable inspiration for their research careers.
 
Dr. Sanderson’s research has turned increasingly to computational biology approaches to understanding the phylogenetic tree of life, especially the relationships of plants. As the volume of molecular sequence data from genomics technologies has continued to skyrocket, computational tools and algorithms to exploit these data have become increasingly important. He collaborates with biologists, computer scientists and mathematicians to develop methods for building large phylogenetic trees from DNA sequence data and to infer complex evolutionary history of two key aspects of biodiversity: species richness and trait diversity. The plants of arid regions of the world such as the Sonoran Desert often present attractive case studies for both of these problems. At the core of Dr. Sanderson’s research is the belief that constructing large, high resolution phylogenetic trees will greatly inform our understanding of evolution and ecology.
 

Recent Work:

Despite computational challenges, phylogenetic trees with thousands of species have been reconstructed using DNA sequence data and high performance computing, including trees of plants. One example is a tree for part of the huge and diverse legume family of flowering plants, which we published in 2006 (McMahon and Sanderson, Syst. Biol., 2006). Recent estimates put the diversity of legumes, which are such an important component of the flora of the Sonoran Desert, at around 19,000 species, far more than all birds and mammals combined. Our analysis only included about 2000 of these, but at the time was a nearly complete sample of all species for which DNA sequence data were available. Since then phylogenies of other groups in the tree of life with 10s of thousands of species have been published.
 
One of the hallmarks of these kinds of analyses is missing data, always a thorn in the side of quantitative methods in science. For example, in his 2006 study, instead of having all genes for all species, we had just 5% of these covered. This is a well known consequence of biased sampling that biologists have undertaken in gathering sequence data from species, and it persists even in more recent whole genome data sets for more biologically interesting reasons. A few years ago, Dr. Sanderon began a collaboration with Mike Steel, a mathematician at the University of Canterbury in New Zealand, to try to understand the impact of these missing genes on inferring phylogenetic trees. They have now published a series of three papers on the subject, culminating most recently in a paper in Science this year (Sanderson, McMahon, and Steel, Science, 2011). The first two papers were quite technical and were aimed at understanding the ambiguities induced by these missing data in a very general context. The most recent paper put some of these results in a more concrete framework that is quite relevant to the way phylogeneticists make large trees. Almost all methods for making trees calculate a "score" for a possible tree based on the data and then engineer a computationally challenging search among a large number of possible trees to find the one with the best score. This is akin to a robot wandering around the Santa Catalina Mountains, always going uphill to find higher ground. They discovered that under very general circumstances, missing data changes the "landscape" of this phylogenetic "mountain range" so that it looks much more like a series of terraced rice fields carved into a mountainside. There are large flat areas--terraces--where all trees are equally good, leading the poor robot to wander aimlessly looking for the terrace boundaries. For the data set on grasses that was analyzed, each of these terraces had millions of equally good trees. Higher terraces represent trees with better scores with respect to the original data. Fortunately, there are efficient methods to detect these terraces and move off them, which is also described in the paper.