# Initialize Otter
import otter
grader = otter.Notebook("lab7-DNA.ipynb")

Lab 7: DNA Analysis#

This lab involves using dictionaries and strings to create a perform message encryption and decryption.

Entering Your Information for Credit#

To receive credit for assignments it is important we can identify your work from others. To do this we will ask you to enter your information in the following code block.

Before you begin#

Run the block of code at the top of the notebook that imports and sets up the autograder. This will allow you to check your work.

# Please provide your first name, last name, Drexel ID, and Drexel email. Make sure these are provided as strings. "STRINGS ARE TEXT ENCLOSED IN QUOTATION MARKS."

# In the assignments you will see sections of code that you need to fill in that are marked with ... (three dots). Replace the ... with your code.
first_name = ...
last_name = ...
drexel_id = ...
drexel_email = ...
grader.check("q0-Checking-Your-Name")

Introduction

In simple terms, your genetic code is a long string of characters drawn from a four-character alphabet ACGT, where each letter signifies a particular nucleic acid. Your skills using Python to manipulate strings can be used to perform analyses of genetic code. In this lab, you will write three functions that process a sample string of nucleic acid symbols in three particular ways.

Task 1: Base Counts

It can be important to count the number of times each nucleic acid occurs in a string of DNA. You will write a function which takes in a string of DNA and counts the number of times each base (A,C,G, or T) occurs. The function returns a dictionary where each key is a letter and its value is the number of times it occurs in the sequence.

Write python code to do the following:

  • Complete a function called base_counts which takes in a DNA string s of unknown length.

  • First, initialize a dictionary with keys "A","C","G", and "T", each having a value of 0.

  • Loop through the characters in string s and count each base by updating its value in the dictionary.

  • Finally, return the resulting dictionary of base counts.

  • Note: you may assume the string contains only the characters (A,C,G, and T).

  • Note: If any base does not occur in the string, that base should still be included in the dictionary, with a value of 0.

Your code replaces the prompt: ...

def base_counts(s):
    ...

# use this to check your results
base_counts("AATTC")
grader.check("task1-base-counts")

Task 2: Pair Counts

In this next task you will count how many times pairs of bases occur in a string. Since there are 4 unique letters in the genetic code, there are 16 possible unique two-letter pairs. You will count pairs using a dictionary where the keys are two-character pairs ("AA", "AC", "AG", etc.) and the values are the number of times each pair is observed. Note that every letter (except the very first and very last) is in two pairs: one where it is the first letter and the other where it is the second letter.

Write python code to do the following:

  • Complete a function called pair_counts which again takes in a DNA string s of unknown length.

  • First, initialize a dictionary with keys ("AA", "AC", "AG", etc.), each having a value of 0.

  • Loop through each pair in string s and count each by updating its value in the dictionary.

  • Finally, return the resulting dictionary of pair counts.

  • Note: you may assume the string contains only the characters (A,C,G, and T).

  • Note: As before, if any pair does not occur in the string, that pair should still be included in the dictionary, with a value of 0.

Hint: To loop through pairs in a string, consider looping over the index values the first base in each pair. If i is the index of the first base, then s[i:i+2] slices the pair.

Your code replaces the prompt: ...

def pair_counts(s):
    ...

# use this to check your results
pair_counts("AATTC")
grader.check("task2-pair-counts")

Task 3: Number of Stops

Triplets of DNA bases are called codons. Certain codons signal a stopping message and are called β€œstop codons.” The following three-letter sequences are stop codons: "TAG", "TGA", "TAA". Your final task is to count the number of times a stop codon occurs in a DNA string.

Write python code to do the following:

  • Complete a function called num_stops which takes in a DNA string s of unknown length.

  • Initialize the count by setting a variable to zero. (You don’t need a dictionary since you are only making one count.)

  • Loop through each triplet in string s and increment the count whenever a stop codon is encountered.

  • Finally, return the integer number of stops.

  • Note: you may assume the string contains only the characters (A,C,G, and T).

Hint: You can loop through triplets in a string just like pairs, by looping over the index values the first base in each triplet. If i is the index of the first base, then s[i:i+3] slices the triplet.

Your code replaces the prompt: ...

def num_stops(s):
    ...

# use this to check your results
ex_string = "ATTTGAAATATGAATGTTAAGATGA"
print(f"{num_stops(ex_string)} stops detected in {ex_string}.")
None stops detected in ATTTGAAATATGAATGTTAAGATGA.
grader.check("task3-num-stops")

Submitting Your Assignment#

To submit your assignment please use the following link the assignment on GitHub classroom.

Use this link to navigate to the assignment on GitHub classroom.

If you need further instructions on submitting your assignment please look at Lab 1.

Viewing your score#

It is your responsibility to ensure that your grade report shows correctly. We can only provide corrections to grades if a grading error is determined. If you do not receive a grade report your grade has not been recorded. It is your responsibility either resubmit the assignment correctly or contact the instructors before the assignment due date.

Each .ipynb file you have uploaded will have a file with the name of your file + Grade_Report.md. You can view this file by clicking on the file name. This will show you the results of the autograder.

We have both public and hidden tests. You will be able to see the score of both tests, but not the specific details of why the test passed or failed.

Note

In python and particularly jupyter notebooks it is common that during testing you run cells in a different order, or run cells and modify them. This can cause there to be local variables needed for your solution that would not be recreated on running your code again from scratch. Your assignment will be graded based on running your code from scratch. This means before you submit your assignment you should restart the kernel and run all cells. You can do this by clicking Kernel and selecting Restart and Run All. If you code does not run as expected after restarting the kernel and running all cells it means you have an error in your code.

Fin#