Phylogenetic Trees, Part 1: Pairwise Alignment of Related DNA / Protein Sequences using the LCS Algorithm

Overview & Concepts

This lesson is the 1st in a unit on constructing Phylogenetic Trees from DNA or Protein Sequences. Students build an Excel spreadsheet to model the simplest implementation of the LCS (Longest Common Subsequence) algorithm in order to calculate the degree of homology between 2 sequences.

Concepts Covered: 

  • General Programming Constructs/Concepts (e.g. Methods, Conditionals)
  • String Ops
  • 2-D Arrays
  • For-Each Loop
  • Evolutionary Relationships
  • Algorithms

Prior Knowledge Required: 

  • Proficiency in Basic Excel Ops
  • Excel formulas and functions
  • Absolute & Relative Cell References
Activity Notes

Days to Teach: 

2 Days

Teaching Tips / Activity Overview: 

 This lesson will be the first in a unit on constructing Phylogenetic Trees from DNA or Protein Sequences. The unit overview / engagement activity is provided by Tara Flick's Bear Evolution Lesson, in which students:

  1. Collect 9 DNA sequences from various species of bears and other closely related species from the NCBI database;
  2. Generate an edit-distance matrix consisting of evolutionary distances between all pairwise combinations of the 9 animals, using UCSD's Biology Workbench CLUSTAL tools;
  3. Generate a phylogenetic tree from the edit-distance matrix.

The edit-distance matrix and the phylogenetic tree for the Bear Evolution lesson (i.e. the results) are contained in the Bear Evolution Lesson Results.doc document.The algorithms behind this process, however, are hidden inside Biology Workbench's opaque "black box".  This new unit will attempt to make these algorithms transparent to students.

  1. Lesson 1 (this lesson): Students construct an Excel spreadsheet to model the simplest implementation of the LCS (Longest Common Subsequence) algorithm in order to calculate the degree of homology between 2 sequences.
  2. Lesson 2: Students construct the same program in Java or another higher level language/IDE.  Students will also add scoring matrices for gaps, insertions, deletions, and point mutations, including implementing the BLOSUM and PAM matrices for protein sequences.
  3. Lesson 3: Students will convert the homology data to edit-distances and construct a matrix.
  4. Lesson 4: Students will build a program to use the edit-distance matrix data to construct a phylogenetic tree.
  • Lesson 1 can be taught in an Introductory CS class where students have learned the basics of Excel spreadsheet construction, including a familiarity with functions, formulas and absolute/relative cell refererencing.  In more advanced classes, Lesson 1 serves as a scaffold / proof-of-concept model for students to better understand the Java (or other higher-level language) program that they will be designing and coding.
  • The 2 POWERPOINT Presentations contain step-by-step instructions for building a small version of the program - something that can handle and show how the algorithm works for a sequence of up to 10 characters in length.
  • 2 XLSM documents are attached.  The first is the completed model built from the 2 POWERPOINT Presentations.  The 2nd is an expanded version that can handle sequences up to 256 characters in length; The β-chain of hemoglobin from Human and Mouse are compared.   

Assessment: 

See Assessment Ideas.docx

Resources: 

Acknowledgements: 

These teacher notes and resources were produced by Scott Portnoff, Downtown Magnets High School, Los Angeles, CA. The idea for this lesson and subsequent unit lessons was inspired by the Bear Evolution lesson (Bioinformatics Activity Bank; Author: Tara Flick).

Academic Standards
Categories & Tags

wholesale jordans shoes

wholesale nike shoes