π§ͺπ₯ Lab 6: DNA Analysis
# Initialize Otter
import otter
grader = otter.Notebook("lab6-DNA.ipynb")
π§ͺπ₯ Lab 6: DNA Analysis#
This lab conducts DNA analysis with strings and dictionaries.
# Imports
Introduction
In simple terms, your genetic code is a long string of characters drawn from a four-character alphabet ACGT, where each letter signifies a particular nucleic acid. Your skills using Python to manipulate strings can be used to perform analyses of genetic code. In this lab, you will write three functions that process a sample string of nucleic acid symbols in three particular ways.
Task 1: Base Counts
It can be important to count the number of times each nucleic acid occurs in a string of DNA. You will write a function which takes in a string of DNA and counts the number of times each base (A,C,G, or T) occurs. The function returns a dictionary where each key is a letter and its value is the number of times it occurs in the sequence.
Write python code to do the following:
Complete a function called
base_countswhich takes in a DNA stringsof unknown length.First, initialize a dictionary with keys
"A","C","G", and"T", each having a value of0.Loop through the characters in string
sand count each base by updating its value in the dictionary.Finally, return the resulting dictionary of base counts.
Note: you may assume the string contains only the characters (
A,C,G, andT).Note: If any base does not occur in the string, that base should still be included in the dictionary, with a value of 0.
Your code replaces the prompt: ...
def base_counts(s):
...
# use this to check your results
base_counts("AATTC")
grader.check("task1-base-counts")
Task 2: Pair Counts
In this next task you will count how many times pairs of bases occur in a string. Since there are 4 unique letters in the genetic code, there are 16 possible unique two-letter pairs. You will count pairs using a dictionary where the keys are two-character pairs ("AA", "AC", "AG", etc.) and the values are the number of times each pair is observed. Note that every letter (except the very first and very last) is in two pairs: one where it is the first letter and the other where it is the second letter.
Write python code to do the following:
Complete a function called
pair_countswhich again takes in a DNA stringsof unknown length.First, initialize a dictionary with keys (
"AA","AC","AG", etc.), each having a value of0.Loop through each pair in string
sand count each by updating its value in the dictionary.Finally, return the resulting dictionary of pair counts.
Note: you may assume the string contains only the characters (
A,C,G, andT).Note: As before, if any pair does not occur in the string, that pair should still be included in the dictionary, with a value of 0.
Hint: To loop through pairs in a string, consider looping over the index values of the first base in each pair. If i is the index of the first base, then s[i:i+2] slices the pair.
Your code replaces the prompt: ...
def pair_counts(s):
...
# use this to check your results
pair_counts("AATTC")
grader.check("task2-pair-counts")
Task 3: Number of Stops
Triplets of DNA bases are called codons. Certain codons signal a stopping message and are called βstop codons.β The following three-letter sequences are stop codons: "TAG", "TGA", "TAA". Your final task is to count the number of times a stop codon occurs in a DNA string.
Write python code to do the following:
Complete a function called
num_stopswhich takes in a DNA stringsof unknown length.Initialize the count by setting a variable to zero. (You donβt need a dictionary since you are only making one count.)
Loop through each triplet in string
sand increment the count whenever a stop codon is encountered.Finally, return the integer number of stops.
Note: you may assume the string contains only the characters (
A,C,G, andT).
Hint: You can loop through triplets in a string just like pairs, by looping over the index values the first base in each triplet. If i is the index of the first base, then s[i:i+3] slices the triplet.
Your code replaces the prompt: ...
def num_stops(s):
...
# use this to check your results
ex_string = "ATTTGAAATATGAATGTTAAGATGA"
print(f"{num_stops(ex_string)} stops detected in {ex_string}.")
grader.check("task3-num-stops")