### Teaching Tips / Activity Overview:

This lesson will be the first in a unit on constructing **Phylogenetic Trees from DNA or Protein Sequences.** The unit **overview** / **engagement **activity is provided by Tara Flick's Bear Evolution Lesson, in which students:

- Collect 9 DNA sequences from various species of bears and other closely related species from the NCBI database;
- Generate an edit-distance matrix consisting of evolutionary distances between all pairwise combinations of the 9 animals, using UCSD's Biology Workbench CLUSTAL tools;
- Generate a phylogenetic tree from the edit-distance matrix.

The edit-distance matrix and the phylogenetic tree for the Bear Evolution lesson (i.e. the **results**) are contained in the **Bear Evolution Lesson Results.doc** document.The algorithms behind this process, however, are hidden inside **Biology Workbench**'s opaque "black box". This new unit will attempt to make these algorithms transparent to students.

**Lesson 1** (this lesson): Students construct an Excel spreadsheet to model the simplest implementation of the LCS (Longest Common Subsequence) algorithm in order to calculate the degree of homology between 2 sequences.
**Lesson 2**: Students construct the same program in Java or another higher level language/IDE. Students will also add scoring matrices for gaps, insertions, deletions, and point mutations, including implementing the BLOSUM and PAM matrices for protein sequences.
**Lesson 3**: Students will convert the homology data to edit-distances and construct a matrix.
**Lesson 4**: Students will build a program to use the edit-distance matrix data to construct a phylogenetic tree.

- Lesson 1 can be taught in an Introductory CS class where students have learned the basics of Excel spreadsheet construction, including a familiarity with functions, formulas and absolute/relative cell refererencing. In more advanced classes, Lesson 1 serves as a
**scaffold / proof-of-concept model** for students to better understand the Java (or other higher-level language) program that they will be designing and coding.

- The
*2 POWERPOINT Presentations** *contain step-by-step instructions for building a small version of the program - something that can handle and show how the algorithm works for a sequence of up to 10 characters in length.

- 2
**XLSM **documents are attached. The first is the completed model built from the **2 POWERPOINT Presentations**. The 2nd is an expanded version that can handle sequences up to 256 characters in length; The β-chain of hemoglobin from Human and Mouse are compared.