Teaching Tips / Activity Overview:
This lesson will be the first in a unit on constructing Phylogenetic Trees from DNA or Protein Sequences. The unit overview / engagement activity is provided by Tara Flick's Bear Evolution Lesson, in which students:
- Collect 9 DNA sequences from various species of bears and other closely related species from the NCBI database;
- Generate an edit-distance matrix consisting of evolutionary distances between all pairwise combinations of the 9 animals, using UCSD's Biology Workbench CLUSTAL tools;
- Generate a phylogenetic tree from the edit-distance matrix.
The edit-distance matrix and the phylogenetic tree for the Bear Evolution lesson (i.e. the results) are contained in the Bear Evolution Lesson Results.doc document.The algorithms behind this process, however, are hidden inside Biology Workbench's opaque "black box". This new unit will attempt to make these algorithms transparent to students.
- Lesson 1 (this lesson): Students construct an Excel spreadsheet to model the simplest implementation of the LCS (Longest Common Subsequence) algorithm in order to calculate the degree of homology between 2 sequences.
- Lesson 2: Students construct the same program in Java or another higher level language/IDE. Students will also add scoring matrices for gaps, insertions, deletions, and point mutations, including implementing the BLOSUM and PAM matrices for protein sequences.
- Lesson 3: Students will convert the homology data to edit-distances and construct a matrix.
- Lesson 4: Students will build a program to use the edit-distance matrix data to construct a phylogenetic tree.
- Lesson 1 can be taught in an Introductory CS class where students have learned the basics of Excel spreadsheet construction, including a familiarity with functions, formulas and absolute/relative cell refererencing. In more advanced classes, Lesson 1 serves as a scaffold / proof-of-concept model for students to better understand the Java (or other higher-level language) program that they will be designing and coding.
- The 2 POWERPOINT Presentations contain step-by-step instructions for building a small version of the program - something that can handle and show how the algorithm works for a sequence of up to 10 characters in length.
- 2 XLSM documents are attached. The first is the completed model built from the 2 POWERPOINT Presentations. The 2nd is an expanded version that can handle sequences up to 256 characters in length; The β-chain of hemoglobin from Human and Mouse are compared.