15.04.2018 Views

programming-for-dummies

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

628<br />

Searching Databases<br />

Mutation and concatenation are just two ways to manipulate molecular structures<br />

within a computer. If you created half a DNA sequence, you still need to<br />

determine the other half. Because DNA consists of two strands bound together<br />

in a double helix <strong>for</strong>m, it’s easy to determine the second sequence of DNA after<br />

you know the first one. That’s because each adenine (A) links up with thymine<br />

(T) and each cytosine (C) links up with guanine (G).<br />

The two strands of DNA are complimentary sequences. To calculate a complimentary<br />

sequence by knowing only one of the sequences, you can use a<br />

simple program that replaces every A with a T, every C with a G, every T with<br />

an A, and every G with a C. A Perl program to do this might look like this:<br />

$DNA = ‘ACTGTTG’;<br />

$compDNA = tr/ACGT/TGCA/;<br />

The tr command simply tells Perl to translate or swap one character <strong>for</strong><br />

another. So the above tr/ACGT/TGCA/; command tells Perl to translate every<br />

A into a T, every C into a G, every G into a C, and every A into a T all at once.<br />

The second step in determining a complimentary sequence is to reverse the<br />

order of that sequence. That’s because sequences are always written a specific<br />

way, starting with the end of the sequence known as 5’ phosphoryl (also<br />

known as 5 prime or 5’) and ending with 3’ hydroxyl (known as 3 prime or<br />

3’). So to display the complimentary sequence correctly, you have to reverse<br />

it using this Perl command:<br />

$DNA = ‘ACTGTTG’;<br />

$compDNA = tr/ACGT/TGCA/;<br />

$revDNA = reverse $compDNA;<br />

It’s important to know both sequences that make up a DNA strand so you<br />

can use both DNA sequences to search <strong>for</strong> in<strong>for</strong>mation. When faced with an<br />

unknown structure, there’s a good chance someone else has already discovered<br />

this identical molecular structure. So all you have to do is match your molecular<br />

structure with a database of known structures to determine what you have.<br />

Searching Databases<br />

After biologists discover a specific molecular structure, they store in<strong>for</strong>mation<br />

about that sequence in a database. That way other biologists can study that<br />

sequence so everyone benefits from this slowly growing body of knowledge.<br />

Un<strong>for</strong>tunately, there isn’t just one database, but several databases that<br />

specialize in storing different types of in<strong>for</strong>mation:<br />

✦ GenBank stores nucleotide sequences.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!