Phylomatic Tutorial

Continuing on with my phylogenetics theme, I wanted to write a tutorial for how to create a supertree using Phylomatic. I recently used phylomatic to generate a figure to highlight the diversity of plant taxa included in my study. I had trouble finding a tutorial on how to use it and fiddled around for an hour or so before I got the hang of it, so here I decided to write up what I learned hoping it will say someone time in the future.

The original paper describing Phylomatic by Webb & Donoghue, 2004 can be found here and the online platform can be found here. Phylomatic was originally developed as a tool to easily make phylogenies using community data. These phylogenies could then be used to generate community phylogenetic measures (see Webb et al, 2002). Originally Phylomatic was restricted to seed plants or else required a reference phylogeny provided by the user, but now Phylomatic has expanded to include an animal reference phylogeny as well as multiple reference phylogenies for seed plants.

Essentially, Phylomatic generates a phylogenetic tree of a list of taxa that is provided by the user, using either one of several ‘built in’ phylogenies that are well cited in the literature (e.g. Zanne et al., 2014) or a phylogeny that is provided by the user. It’s pretty much that easy. However after viewing the tree you may notice that some tips are out of place. For example, you may notice that taxa of the same genus do not grouped together, but rather form paraphyletic groups. It is possible to correct this in R.

In this tutorial I show you how to generate a phylogeny using Phylomatic in R. Then I show you how to add and re-arrange tips in the phylogeny in R. I use R for the focus of this tutorial so I don’t have to change between environments; however it is also possible to use Phylomatic from the terminal. To do so you will need to install the Phylocom software from Github (here) which is used for the analysis of community ecology data. A full script version of this tutorial, with lots of detailed annotation, can be found on my Github, here.

So let’s get started!

You will need the following packages loaded in your R environment, ‘brranching’, ‘ape’, ‘phytools’, and potentially the ‘phylocomr’ package. The ‘brranching’ package includes the phylomatic function. It also includes the bladj function which is used for calibrating the branch lengths of trees; however I don’t cover the use of bladj in this tutorial. The packages ‘ape’ and ‘phytools’ enable the visualization and manipulation of the class phylo object in R. The ‘phylocomr’ package allows you to use phylomatic with you own input reference tree.

To use the phylomatic function in R you will need a taxa dataset that is in the correct syntax, which is given by ‘family/genus/Binomial_name’. Below is an example and will be the list of taxa I use in this tutorial. I have given it the name ‘taxa_subset’. You can store this list as a csv that you can then read into your R environment. (In my script I show you how to generate this syntax from a data frame with two columns: Family and Species, which is likely how your species data will be formatted.) Note that there is a way for you to submit just a species list and for phylomatic to get the family names for you using the function phylomatic_names() in the brranching package (you can see the documentation for the phylomatic_names() function here)

molluginacae/mollugo/Mollugo_crockeri              
cistaceae/helianthemum/Helianthemum_patens          
berberidaceae/berberis/Berberis_corymbosa           
phrymaceae/diplacus/Diplacus_calycinus              
asteraceae/bidens/Bidens_cervicata                  
asclepiadaceae/sarcostemma/Sarcostemma_angustissimum
acanthaceae/justicia/Justicia_glaziovii             
verbenaceae/verbena/Verbena_sedula                 
rubiaceae/bertiera/Bertiera_bracteosa               
rubiaceae/galium/Galium_ovalleanum                
ericaceae/arctostaphylos/Arctostaphylos_cruzensis   
asteraceae/hazardia/Hazardia_rosarica             
berberidaceae/berberis/Berberis_microphylla         
boraginaceae/phacelia/Phacelia_ramosissima          
asteraceae/hazardia/Hazardia_squarrosa             
fabaceae/lotus/Lotus_nevadensis                   
polygonaceae/eriogonum/Eriogonum_fasciculatum       
asteraceae/bidens/Bidens_hillebrandiana             
asteraceae/erigeron/Erigeron_rosulatus              
caryophyllaceae/silene/Silene_antirrhina

taxa_subset <- read.csv('taxa_list.csv', header=T, sep=',', stringsAsFactors=F)

Next you can execute the function phylomatic(). It takes two main arguments, taxa and stored tree. You set taxa equal to your taxalist (in this case ‘taxa_subset’). For storedtree there are several options. First you can choose from among a set of built-in phylogenies. These include, R20120829 (Phylomatic tree R20120829 for plants), smith2011 (Smith 2011, plants), binindaemonds2007 (Bininda-Emonds 2007, mammals), or zanne2014 (Zanne et al. 2014, plants). If nothing is set to stroredtree the default is R20120829. Alternatively you use the function ph_phylomatic() from the phylocomr package to use a reference phylogeny that you provide. To do this, read your reference tree into your R environment using the read.tree or read.nexus function (depending on your file type) and give it a name. Then set your reference tree to the argument ‘phylo’ in the ph_phylomatic() function.

# using a built-in tree
tree <- phylomatic(taxa=taxa_subset, storedtree='zanne2014')

# using a tree provided by the user
ref_tree <- read.tree('reference_tree_file.tr')
ref_tree <- read.nexus('reference_tree_file.nex')
tree <- ph_phylomatic(taxa=taxa_subset, phylo=ref_tree)

The function phylomatic generates a tree of class phylo, which can be used as an other phylo object in R. Also notice that after you execute the function, a message will appear letting you know which taxa from your list were not able to be included in the phylogeny. If the placement of these taxa is know you can can add these missing taxa to the tree using functions provided in the ape and phytools packages. If you used the GUI Phylomatic platform available online you should have had a new broswer window pop open and display a tree in newick format. You can copy and paste the newick formatted tree into R using the read.tree function and setting the text argument to the newick formatted tree. See below.

tree <- read.tree(text='((((eriogonum_fasciculatum:89.4609,Silene_antirrhina:89.4609)Caryophyllales:27.2363,((((bidens_hillebrandiana:22.8719,bidens_cervicata:22.8719)bidens:22.8719,erigeron_rosulatus:45.7437,hazardia_rosarica:45.7437,Hazardia_squarrosa:45.7437)Asteraceae:57.1278,(((bertiera_bracteosa:56.8699,galium_ovalleanum:56.8699)Rubiaceae:16.4995,((diplacus_calycinus:38.1329,verbena_sedula:38.1329):3.88331,justicia_glaziovii:42.0162):31.3533):1.62956,Phacelia_ramosissima:74.999):27.8725):4.73798,Arctostaphylos_cruzensis:107.609)Asteridae:9.08781)Pentapetalae:2.46823,(helianthemum_patens:117.535,lotus_nevadensis:117.535)Rosidae:1.63077)Gunneridae:17.735,(berberis_corymbosa:117.268,Berberis_microphylla:117.268)Berberidaceae:19.6321);') 

You can quickly inspect your tree to see if it makes sense.

plot(tree) # plots a typical vertical cladogram
plot(tree, type='f'cex=0.3) # for larger phylogenies it may be helpful to plot a circular tree (type='f') and descrease the font size (cex=0.3)

It should look something like this:

phylogeny generated using Phylomatic

phylogeny generated using Phylomatic

If everything looks good then you can go ahead and use this tree and make it look pretty! However if you notice that some tips are out of place or not all of you taxa were included in the tree you will need to do a little bit more work.

First you will probably want to add the taxa that were not included in the phylogeny. Although the phylomatic function lets you know which taxa were not included in the phylogeny, you can also get the list of taxa that are not included in the phyloeny using a bit of code. You will then need to identify the taxa that you want to make the new sister taxa. To do this you can run the code below which I modified from a tutorial by Liam reveal which can be found here.

# first we need to fix the fact taht for some reason the first letter of the binomial name of some of the taxa is not captialized. If we don't fix this then we will not beable to match taxa in the tree with taxa from our original list

tree.tips <- tree$tip.label # the taxa that are included in the tree
corrected.tips <- c() # set empty vector to fill
for (i in 1:length(tree.tips)) {
    string <- strsplit(tree.tips[i], split='')
    first_letter <- toupper(string[[1]][1])
    corrected.i <- sub('^|[[:alpha:]]', first_letter, tree.tips[i])
    corrected.tips <- c(corrected.tips, corrected.i)
}
tree$tip.label <- corrected.tips  # assigns corrected tip labels to the phylogeny
tree.tips <- tree$tip.label # corrected tip labels

index <- which(!(taxa$Species %in% tree.tips)) # identifies which taxa are not included in the phylomatic tree
missing_taxa <- taxa$Species[index] # here are the missing taxa

# next idenify which taxa you want the new sister species to be for each of the missing taxa and assign them to missing taxa_i and sister, respecitively
missing_taxa_i <- 'some missing taxa'
sister <- 'some sister species'

sister_location <- which(tree$tip.label==sister) 
position <- tree$edge.length[which(tree$edge[,2])]==which(tree$tip.label == sister) # here I set the edge.length (branch length) to equal the branch length of the sister taxa creating a polytomy. You can change this by multipying the position by a fraction. For example if you mulitple position by 0.5 then the taxa will be place halway along the branch of the sister taxa. 
edge.length <- tree$edge.length[which(tree$edge[,2])]==which(tree$tip.label==sister) 
tree <- bind.tip(tree, to_drop, where= where, position=position, edge.length=edge.length) # adds tip to tree; the final argument edge.length specifies the terminal branch lenght and must be included when the phylogeny is not ultrametric

Next you might need to rearrange tips. The process if very similar to adding missing taxa. You will need to visually inspect the phylogeny and determine which taxa are misplaced. You will drop those taxa from the tree and then read them as sister to the species you identify as the new sister species. The following code will execute this process.

to_drop <- 'some missplaced taxa'
sister <- 'some sister speices'

tree <- drop.tip(tree, to_drop) # remove taxa form phylogeny
sister_location <- which(tree$tip.label==sister) 
position <- tree$edge.length[which(tree$edge[,2])]==which(tree$tip.label == sister) # here I set the edge.length (branch length) to equal the branch length of the sister taxa. You can change this by multipying position by a fraction. For example if you mulitple position by 0.5 then the taxa will be place halway along the branch of the sister taxa. 
edge.length <- tree$edge.length[which(tree$edge[,2])]==which(tree$tip.label==sister) 

tree <- bind.tip(tree, to_drop, where= where, position=position, edge.length=edge.length) # the final argument edge.length specifies the terminal branch lenght and must be included when the phylogeny is not ultrametric

Now you can replete the phylogeny and see if everything is good to go! Definitely check out the R script for this tutorial which can be accessed through my Github.