628 Searching Databases Mutation and concatenation are just two ways to manipulate molecular structures within a computer. If you created half a DNA sequence, you still need to determine the other half. Because DNA consists of two strands bound together in a double helix form, it’s easy to determine the second sequence of DNA after you know the first one. That’s because each adenine (A) links up with thymine (T) and each cytosine (C) links up with guanine (G). The two strands of DNA are complimentary sequences. To calculate a complimentary sequence by knowing only one of the sequences, you can use a simple program that replaces every A with a T, every C with a G, every T with an A, and every G with a C. A Perl program to do this might look like this: $DNA = ‘ACTGTTG’; $compDNA = tr/ACGT/TGCA/; The tr command simply tells Perl to translate or swap one character for another. So the above tr/ACGT/TGCA/; command tells Perl to translate every A into a T, every C into a G, every G into a C, and every A into a T all at once. The second step in determining a complimentary sequence is to reverse the order of that sequence. That’s because sequences are always written a specific way, starting with the end of the sequence known as 5’ phosphoryl (also known as 5 prime or 5’) and ending with 3’ hydroxyl (known as 3 prime or 3’). So to display the complimentary sequence correctly, you have to reverse it using this Perl command: $DNA = ‘ACTGTTG’; $compDNA = tr/ACGT/TGCA/; $revDNA = reverse $compDNA; It’s important to know both sequences that make up a DNA strand so you can use both DNA sequences to search for information. When faced with an unknown structure, there’s a good chance someone else has already discovered this identical molecular structure. So all you have to do is match your molecular structure with a database of known structures to determine what you have. Searching Databases After biologists discover a specific molecular structure, they store information about that sequence in a database. That way other biologists can study that sequence so everyone benefits from this slowly growing body of knowledge. Unfortunately, there isn’t just one database, but several databases that specialize in storing different types of information: ✦ GenBank stores nucleotide sequences.
Searching Databases 629 ✦ Swiss-Prot stores protein sequences. ✦ OMIM (Online Mendelian Inheritance in Man) stores human genes and genetic disorders data. After you find a particular sequence, you can look up articles about particular sequences in PubMed, a database of articles published in biomedical and life science journals. Although it’s possible to search these databases manually, it’s usually much faster and easier to write a program that can send a list of sequences to a database, search that database for known sequences that match the ones sent, and then retrieve a list of those known sequences for further study. Because searching databases is such a common task, biologists have created a variety of tools to standardize and simplify this procedure. One of the more popular tools is Basic Local Alignment and Search Tool, otherwise known as BLAST. BLAST can look for exact matches or just sequences that are similar to yours within specified limits, such as a sequence that’s no more than ten percent different. This process of matching up sequences is sequence alignment or just alignment. By finding an exact match of your sequence in a database, you can identify what you have. By comparing your sequence with similar ones, you can better understand the possible characteristics of your sequence. For example, a cat is more similar to a dog than a rattlesnake, so a cat would likely behave more like a dog than a rattlesnake. The BLAST algorithm and computer program was written by the U.S. National Center for Biotechnology Information (NCBI) at Pennsylvania State University (www.ncbi.nlm.nih.gov/BLAST). Book VII Chapter 2 Bioinformatics The basic idea behind BLAST is to compare one sequence (called a query sequence) with a database to find exact matches of a certain number of characters, such as four. For example, suppose you had a sequence like this: ATCACCACCTCCG With BLAST, you could specify that you only want to find matches of four characters or more, such as: ATCACCTGGTATC
Agreed in 2016, the motive of the General Data Protection Regulation (GDPR) is to better protect the personal data of European Union “data subjects” – EU citizens and other nationals physically present in the EU at the time data are collected. Visit: https://www.hipaajournal.com/gdpr-training/