12.07.2015 Views

View - ResearchGate

View - ResearchGate

View - ResearchGate

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Estimating Protein Function Using Protein–Protein Relationships 113MSLIEIDGSYGEGGGQILRTAVGMSALTGEPVRIYNIRANRPRPGLSHQHLHAVKAVAEICDAE>hsapiens|gi|20093442MGVIEDMMKVGMRSAKAGLEATEELIKLFREDGRLVGSILKEMEPEEITELLEGASSQLIRMIR>hsapiens|gi|20093443MSGNFRKMPEVPDPEELIDVAFRRAERAAEGTRKSFYGTRTPPEVRARSIEIARVNTACQLVQ>>celegans|gi|1453778MEYIYAALLLHAAGQEINEDNLRKVLEAAGVDVDDARLKATVAALEEVDIDEAIEEAAVPAAAP>celegans|gi|1453779MVPWVEKYRPRSLKELVNQDEAKKELAAWANEWARGSIPEPRAVLLHGPPGTGKTSAAYALAHD>scerevisiae|gi|6799765MAEHELRVLEIPWVEKYRPKRLDDIVDQEHVVERLKAYVNRGDMPNLLFAGPPGTGKTTAALCLThe contents of this file are in the “FASTA” format, wherein lines startingwith the “>” sign are treated as comments (for more details, seehttp://www.ebi.ac.uk/ help/formats_frame.html) and lines that do not start withthe “>” sign are treated as sequence. For the proteins described in the mockdatabase above, the comment line contains an abbreviation of the organismname (the first letter of the genus name concatenated with the entire speciesname), followed by the words “gi,” which alert the user to the fact that the followingidentifier is a Genbank identifier. The identifier associated with eachsequence is unique, and can be used to retrieve records from the NCBI website.This comment structure is used here just for illustration purposes, andother more suitable formats can be envisioned depending on need.Advanced users familiar with this step and the subsequent database formattingstep for BLAST can choose to construct more advanced databases, such asthose with indices. If an advanced database is to be created, users are advisedto carefully follow instructions with respect to identifiers associated with theindividual sequences.3.1.2. Formatting the Database for Use by BLASTThe database of reference genomes requires formatting before it can be usedfor sequence comparison by BLAST. A special program called “formatdb,”included with the BLAST tools package, is needed for this task. A number ofoptions can be set for formatdb, depending on the type of input and outputdesired. However, for this protocol, as the input is a file containing amino acidsequences and no additional information is to be generated, no options need tobe specified for formatdb (i.e., formatdb is run with default options):% /path/to/blast/package/formatdb –imyDatabaseFile

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!