Bioinformatics Using Basic Genomics Sequences

Overview & Concepts

This activity will familiarize high school students on the importance of computational biology in today's world. By writing computer programs that use genomics sequence analysis, the student will have a better understanding of molecular biology and how the vast amount of data that is provided in multiple on-line databases can be simplified by a few lines of code.

Concepts Covered: 

Introduce high school students to the role of computer science in addressing fundamental questions posed in modern day molecular biology:

  • Design computer algorithms to automate basic operations in a biological sequence analysis (e.g., DNA, RNA, amino acids, codons, proteins).
  • Implement algorithms in Java/Python using basic data structures (e.g., arrays, matrices, hash tables)
  • Use standard I/O as well as read and write to text files
  • Using computational thinking, be able to co-relate among biological occurrences and computer science concepts

Prior Knowledge Required: 

Biology, Concepts: DNA Structure – Nucleotide Bases RNA Structure, Amino Acids, Codons, FASTA format Computer Science, Standard Input/Output Format, File Input/Output Format, While/For Loops, String Class/Variables, String Character access, Arrays/Matrices, Regular Expressions, Pattern Matching, Hash Tables

Activity Notes

Days to Teach: 

5 days - Day to Day process is described under teaching tips/activity overview below.

Materials: 

  • Programming Tools: Java, Java IDE (Eclipse, JCreator, Dr. Java) Python, Python IDE (Idle, Eclipse)
  • Attached Student Handout Describing Activity
  • Codon Protein Chart

Teaching Tips / Activity Overview: 

Activity 1 is very basic. After reviewing the class on DNA sequencing and teaching the students about the FASTA format (about 30 minutes), students will be given an input file that has a short DNA sequence (in Fasta format) that is read. The purpose of this program is to get the students used to the data and the FASTA header information. Initially, the output should be to the console, and then the student will print it out to a file. It should only take the students about 15 minutes to complete this programming lab.

  1. In this exercise, the student will learn how to input a DNA "sequence" as input to a Java/Python  program and make the program print it out. The input will be done through two means - standard input and as an input text file in FASTA format.

             - Progamming concepts:     I/O, while loop, string variables   

Activity 2 continues on after a review of regular expressions and pattern matching within Java or Python. The same data file will be used. Attached is a worksheet of regular expressions that students have completed beforehand as a class exercise (with corresponding answers on the last page). Reviewing of regular expressions and pattern matching should not take more than 20 minutes with about 20 minutes to complete this programming lab.

  1. In this exercise, the user will learn how to convert a given strand of a DNA sequence into its corresponding reverse complemented strand sequence.

             - Programming concepts:    regular expressions, string character access & pattern matching 

Activity 3 continues on after a review of string translation in Java or Python. This is a very basic program that students would need to replace the "T" base with the "U" base.  The same data file will be used as before. Attached is a worksheet of regular expressions that students have completed beforehand as a class exercise (with corresponding answers on the last page). Reviewing of regular expressions and pattern matching should not take more than 20 minutes with about 20 minutes to complete this programming lab.

  1. In this exercise, the user will learn how to convert a DNA sequence into its corresponding RNA sequence. The main operation needed is to substitute every "T" with a "U".

             - Programming concept: string translation

Activity 4 continues on after teaching the hashmap data structure.  "Hashing" is a way to map a string into a number. For example, each codon can be mapped to a unique number starting from 0 (AAA), 1 (AAC), 2(AAG), ... 63(TTT). Hash mapping can be used to store the corresponding amino acid. After teaching hash tables for about 30 minutes, have students start the lab and complete it the next day. The "Codon Protein Chart" is on the student lab hand-out. This gives the students a reference to the three condons that correspond to the correct protein.

  1. In this exercise, the user will "translate" a coding sequence of a DNA into its corresponding amino acid sequence. A hash table will be used to store the genetic code table.

             - Programming concept: hash table

Assessment: 

Assessment will be in class. After a student has completed the programming assignment, the teacher will review the program and output to make sure that it works correctly. Another test file will be given to the student to make sure the student did not code "to the file."

Extensions: 

  • Students will be handed a document explaining the different types of SNPs with examples. After finding a mutation in the second sequence, the student must print out  the type of mutation discovered as well as the mutated sequence that corresponds with it.
  • In this extension, the user will compare two DNA sequences (of the same length) and identify sites of point mutations or Single Nucleotide Polymorphisms (SNPs).
  • Programming concepts:  loop and character comparison

Resources: 

  • The 'Central Dogma' of molecular biology is that 'DNA makes RNA makes protein'. This animation shows how molecular machines transcribe the genes in the DNA of every cell into portable RNA messages, how those messenger RNA are modified and exported from the nucleus, and finally how the RNA code is read to build proteins. The URL to the following youtube animation is an excellent one to show to the class for review of the genomic process.  http://www.youtube.com/watch?v=J3HVVi2k2No
  • Excellent Textbook with great graphics:  Biology: Concepts and Connections 6th edition (Chapters 9-12) Campbell, Neil; Dickey, Jean; Reece,Jane; Taylor, Martha; Simon, Eric Copyright: 2009 Pearson/Benjamin Cummings Publisher ISBN: 9787-0-321-48984-5

Acknowledgements: 

These teacher notes and resources were produced by A.G. Walter.

Academic Standards
Categories & Tags

wholesale jordans shoes

wholesale nike shoes