Phylogenetics - A Beginner's Reference

This page is meant as a general reference for beginners to the field of phylogenetics. There is so much to know (and I can't possiblily put it all on one page) but here is a great collection of information to get you started. This page has been organized into the following sections: The first section gives a general history and overview of taxonomy and how phylogenetics plays a role. The middle of the page is a collection of links to pages of information created by people who know a lot more about phylogenetics than I do. Finally, the last two sections list a number of sites where you can actually download phylogenetic software to work for yourself and a series of online worked examples that let you work step-by-step through really using the programs.


What is phylogenetics?

The formal definition of phylogenetics from the Oxford University Press reads; "Phylogenetics is the taxonomical classification of organisms on the bases of their degree of evolutionary relatedness". So... what does this mean?

Taxonomy Background

Well, one of the oldest and most frustrating problems in biology has been devising a system to keep track of all the organisms on earth. From the simplest bacteria to the tallest tree, we need a comprehensive system to classify organisms. This is the field of taxonomy.

Taxonomy-(from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification according to a pre-determined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval. In theory, the development of a good taxonomy takes into account the importance of separating elements of a group (taxon) into subgroups (taxa) that are mutually exclusive, unambiguous, and taken together, include all possibilities. In practice, a good taxonomy should be simple, easy to remember, and easy to use.
(See reference)

The Linnaeus Classification System
One of the best known taxonomies is the one devised by the Swedish scientist, Carl Linnaeus, whose classification for biology is still widely used (with modifications). The Linnaeus Classification System is probably the one you remember from high school biology. It goes from general to specific:

(See a general diagram, example with dog, example with human)

Numerical Taxonomy
The major problem with the Linnaeus system is that it becomes a very subjective system. There is room for different people to interpret different groups because each level is arbitrarily defined. So, beginning in the 1950's, scientists started looking for alternative methods of classifying organisms. This gave rise to numerical taxonomy.

Numerical taxonomy- The classification of organisms by purely mathematical means. It is based on quantifying observable characteristics of organisms and may be operated at various taxonomic levels to deal with species or higher taxa. It involves the grouping and computation of the similarity of characters; the results are usually displayed graphically, as a phenogram or dendrogram.
(See reference)

The goal of numerical taxonomy was to be objective. This was to be achieved by converting all observations into numbers and then using a predefined calculation to divide organisms into taxa. However, it was quickly realized that there was a lot of subjectivity in which observations were used and how these observations were converted into numbers. So, the problem of subjectivity still existed. This is when phylogenetics emerged.

Phylogenetics
The basic idea of phylogenetics is to classify organisms the way they evolved. This first means that you have to accept Darwin's Theory of Evolution (which is not without question). The basic idea is that life began as a single cell and that everything on earth has developed from the one cell through mutation and natural selection. So, naturally, all organisms can be related through an evolutionary "family tree". A project called The Tree of Life has begun to organize the information we have now.

Today, phylogenetics is most commonly done at a molecular level. A gene (DNA) or protein sequence is chosen based on a number of criteria. This same sequence is then determined for a number of different organisms and all the sequences are aligned to each other using a multiple sequence alignment program. From this alignment, a phylogenetic tree is created from tree building algorithms to graphically show the sequences (and hopefully) how they are related. There are many ways of determining evolutionary relatedness from the multiple sequence alignment including maximum likelihood, maximum parsimony, pairwise distance and more (see Tree-building section).

Phylogenetics has emerged as a leading taxonomic method. However, there is still controversy as to its validity and reliability. Since evolution is a historical event, each step in the phylogenetics process requires certain assumptions to be made. For a great reference on the assumptions, see Baxevanis and Ouellette's "Bioinformatics: A Practical to the Analysis of Genes and Proteins", Chapter 14 - Phylogenetic Analysis. (Sorry, it's not online, but it is a great reference.) As well, different tree building methods do different types of analysis and mean different things. To make an analysis as valid as possible, the appropriate method must be used with the appropriate data.

As the popularity of phylogenetics has increased, so has the need to manage the data generated from these analyses. The database TreeBASE has been created for this purpose. "TreeBASE is a relational database of phylogenetic information hosted by the University at Buffalo. TreeBASE stores phylogenetic trees and the data matrices used to generate them from published research papers." Biologists may submit their data to this database as a way to make it available to the general community, especially if their publication didn't or couldn't give complete details as to what was done.

Phylogenetics can be a powerful method of taxonomy when properly understood. And, hopefully, this page will help you on your way to that understanding.


Glossary of Terminology

Online Phylogenetics Class Page

Comprehensive Glossary


Online Tutorials

Online Phylogenetics Class Page

Berkeley's Phylogenetics

Phylogenetics Lab from the Virtual Paleobotany Lab


Tree-Building Methods

Introduction to Tree Building

Tree-Building Methods

Distance Based Phylogenies

How to Make a Phylogenetic Tree


Short Summaries

Introduction to Phylogenetic Systematics

Molecular Phylogenetics

Principles of Phylogeny

Determining the Evolutionary Relationships of Species


Getting Started on your Own

Actually Doing Phylogenetics

Software that can be accessed and used online: (do not need to download)

To download software for your own use (all of it is free):


Examples to Work Through

Introduction to Bioinformatics

Neighbour-Joining Trees

Any suggestions for additions or page comments can be sent to blc257@mail.usask.ca