12.07.2015 Views

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

From Protein Structure to Function with Bioinformatics.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

96 T. Nugent and D.T. JonesTable 4.1 (continued)<strong>Function</strong>SuperfamilyStanninGlycophorin AInovirus (filamen<strong>to</strong>us phage) major coatproteinPili subunitsPulmonary surfactant-associated proteindatabase (http://blanco.biomol.uci.edu/) all contain TM proteins of known structuredetermined using X-ray and electron diffraction, nuclear magnetic resonance andcryoelectron microscopy. OPM, PDBTM and CGDB additionally contain orientationpredictions of the protein relative <strong>to</strong> the membrane based on water-lipid transferenergy minimisation (Lomize et al. 2006a), hydrophobicity/structural feature analysis(Tusnády et al. 2005a) and coarse grained molecular dynamic simulations (Sansomet al. 2008). For <strong>to</strong>pological studies, OPM provides N-terminus localisation information,while TOPDB (Tusnády et al. 2008) and Mp<strong>to</strong>po (Jayasinghe et al. 2001) also includeTM proteins of unknown 3D structure whose <strong>to</strong>pologies have been experimentallyvalidated using low-resolution techniques such as gene fusion, antibody and mutagenesisstudies. A number of TM protein databases collect information on specific familiesincluding potassium channels (Li and Gallin 2004) and GPCRs (Horn et al. 2003), whileothers such as LGICdb (Donizelli et al. 2006) and TCDB (Saier et al. 2006), focuson particular structural or functional classes.The Möller dataset (Möller et al. 2000), although in need of modification basedon recent SWISS-PROT annotations (Boeckmann et al. 2003), provides a diversetraining and validation set that suffers less from the prokaryotic bias present in 3Dstructure derived sets. As <strong>with</strong> all bioinformatics databases, care should be taken <strong>to</strong>ensure that a given resource is frequently updated. The rate at which new sequencesand structures are deposited in Genbank and the PDB (and occasionally retractede.g. Pornillos et al. 2005) results in significant manual annotation for databaseadministra<strong>to</strong>rs, and much evidence suggests that this workload often exceeds theamount of time an administra<strong>to</strong>r is willing <strong>to</strong> commit.4.5 Multiple Sequence AlignmentsMultiple sequence alignments play an important role in TM protein structure prediction.Homologous sequences identified via database searches can be used <strong>to</strong>construct sequence profiles which can significantly enhance TM <strong>to</strong>pology predictionaccuracy (Käll et al. 2005; Jones 2007), while template structures can be usedfor homology modelling.Conventional pair wise alignment methods return possible matches based on ascoring function that relies on amino acid substitution matrices such as PAM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!