15.04.2018 Views

programming-for-dummies

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Searching Databases 629<br />

✦ Swiss-Prot stores protein sequences.<br />

✦ OMIM (Online Mendelian Inheritance in Man) stores human genes and<br />

genetic disorders data.<br />

After you find a particular sequence, you can look up articles about particular<br />

sequences in PubMed, a database of articles published in biomedical and<br />

life science journals.<br />

Although it’s possible to search these databases manually, it’s usually much<br />

faster and easier to write a program that can send a list of sequences to a<br />

database, search that database <strong>for</strong> known sequences that match the ones<br />

sent, and then retrieve a list of those known sequences <strong>for</strong> further study.<br />

Because searching databases is such a common task, biologists have created<br />

a variety of tools to standardize and simplify this procedure. One of the more<br />

popular tools is Basic Local Alignment and Search Tool, otherwise known as<br />

BLAST.<br />

BLAST can look <strong>for</strong> exact matches or just sequences that are similar to yours<br />

within specified limits, such as a sequence that’s no more than ten percent<br />

different. This process of matching up sequences is sequence alignment or<br />

just alignment.<br />

By finding an exact match of your sequence in a database, you can identify<br />

what you have. By comparing your sequence with similar ones, you can<br />

better understand the possible characteristics of your sequence. For example,<br />

a cat is more similar to a dog than a rattlesnake, so a cat would likely<br />

behave more like a dog than a rattlesnake.<br />

The BLAST algorithm and computer program was written by the U.S. National<br />

Center <strong>for</strong> Biotechnology In<strong>for</strong>mation (NCBI) at Pennsylvania State<br />

University (www.ncbi.nlm.nih.gov/BLAST).<br />

Book VII<br />

Chapter 2<br />

Bioin<strong>for</strong>matics<br />

The basic idea behind BLAST is to compare one sequence (called a query<br />

sequence) with a database to find exact matches of a certain number of<br />

characters, such as four. For example, suppose you had a sequence like this:<br />

ATCACCACCTCCG<br />

With BLAST, you could specify that you only want to find matches of four<br />

characters or more, such as:<br />

ATCACCTGGTATC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!