26.01.2014 Views

Protein Classification and Structure Prediction Amino acid ...

Protein Classification and Structure Prediction Amino acid ...

Protein Classification and Structure Prediction Amino acid ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Protein</strong> <strong>Classification</strong> <strong>and</strong><br />

<strong>Structure</strong> <strong>Prediction</strong><br />

<strong>Amino</strong> <strong>acid</strong> composition<br />

• Basic <strong>Amino</strong> Acid<br />

<strong>Structure</strong>:<br />

• The side chain, R,<br />

varies for each of<br />

the 20 amino <strong>acid</strong>s<br />

Side chain<br />

H<br />

H<br />

<strong>Amino</strong><br />

group<br />

N<br />

R<br />

C α<br />

H<br />

C<br />

O<br />

OH<br />

Carboxyl<br />

group<br />

<strong>Protein</strong>s are chains of amino <strong>acid</strong>s<br />

The Peptide Bond<br />

• Dehydration synthesis<br />

• Repeating backbone: N–CN<br />

α –C –N–C α –C<br />

Peptidyl polymers<br />

• A few amino <strong>acid</strong>s in a chain are called a polypeptide. . A<br />

protein is usually composed of 50 to 400+ amino <strong>acid</strong>s.<br />

• Since part of the amino <strong>acid</strong> is lost during dehydration<br />

synthesis, we call the units of a protein amino <strong>acid</strong><br />

residues.<br />

carbonyl<br />

carbon<br />

amide<br />

nitrogen<br />

• Convention – start at amino terminus <strong>and</strong> proceed to<br />

carboxy terminus<br />

Krane & Raymer<br />

Side chain properties<br />

• Carbon does not make hydrogen bonds with<br />

water easily – hydrophobic<br />

• O <strong>and</strong> N are generally more likely than C to h-h<br />

bond to water – hydrophilic<br />

• The amino <strong>acid</strong>s forms three general groups:<br />

• Hydrophobic<br />

• Charged (positive/basic & negative/<strong>acid</strong>ic)<br />

• Polar<br />

The Hydrophobic <strong>Amino</strong> Acids<br />

Proline severely<br />

limits allowable<br />

conformations!<br />

1


The Charged <strong>Amino</strong> Acids<br />

The Polar <strong>Amino</strong> Acids<br />

More Polar <strong>Amino</strong> Acids<br />

Planarity of the peptide bond<br />

And then there’s…<br />

Psi (ψ) – the<br />

angle of<br />

rotation about<br />

the Cα-C bond.<br />

Phi (φ) – the<br />

angle of<br />

rotation about<br />

the N-Cα bond.<br />

The planar bond angles <strong>and</strong> bond<br />

lengths are fixed.<br />

Krane & Raymer<br />

Primary & Secondary <strong>Structure</strong><br />

• Primary structure = the linear sequence of amino<br />

<strong>acid</strong>s comprising a protein:<br />

AGVGTVPMTAYGNDIQYYGQVT…<br />

• Secondary structure<br />

• Regular patterns of hydrogen bonding in proteins<br />

result in two patterns that emerge in nearly every<br />

protein structure known: the α-helix<br />

<strong>and</strong> the<br />

β-sheet<br />

• The location of direction of these periodic, repeating<br />

structures is known as the secondary structure of the<br />

protein<br />

The alpha helix<br />

φ ≈ ψ<br />

≈ −60°<br />

Krane & Raymer<br />

2


Properties of the alpha helix<br />

• φ ≈ ψ ≈ −60°<br />

• Hydrogen bonds<br />

between C=O of<br />

residue n, , <strong>and</strong><br />

NH of residue<br />

n+4<br />

• 3.6 residues/turn<br />

• 1.5 Å/residue rise<br />

• 100°/residue turn<br />

Properties of a-helices<br />

• 4 – 40+ residues in length<br />

• Often “dual-natured”<br />

• Half hydrophobic <strong>and</strong> half hydrophilic<br />

• Mostly when surface-exposed<br />

exposed<br />

• For many α-helices<br />

• Helix formers: Ala, Glu, Leu,<br />

Met<br />

• Helix breakers: Pro, Gly, Tyr,<br />

Ser<br />

Krane & Raymer<br />

Krane & Raymer<br />

The beta str<strong>and</strong> (& sheet)<br />

φ ≈ − 135°<br />

ψ ≈ +135°<br />

Properties of beta sheets<br />

• Formed of stretches of 5-105<br />

residues in extended<br />

conformation<br />

• Pleated – each C α a bit above or<br />

below the previous<br />

• Parallel/aniparallel<br />

aniparallel,<br />

contiguous/non-contiguous<br />

Krane & Raymer<br />

Parallel <strong>and</strong> anti-parallel<br />

b-sheets<br />

Anti-parallel is slightly energetically favored<br />

Anti-parallel<br />

Parallel<br />

Turns <strong>and</strong> Loops<br />

• Secondary structure elements are connected by regions<br />

of turns <strong>and</strong> loops<br />

• Turns – short regions<br />

of non-α, , non-β<br />

conformation<br />

• Loops – larger stretches with no secondary structure.<br />

Often disordered.<br />

• “R<strong>and</strong>om coil”<br />

• Sequences vary much more than secondary structure regions<br />

3


Levels of <strong>Protein</strong><br />

<strong>Structure</strong><br />

• Secondary structure<br />

elements combine to<br />

form tertiary structure<br />

• Quaternary structure<br />

occurs in multi-enzyme<br />

complexes<br />

• Many proteins are active<br />

only as homodimers,<br />

homotetramers, , etc.<br />

<strong>Protein</strong> <strong>Structure</strong> Examples<br />

Krane & Raymer<br />

• Two cysteines in<br />

close proximity<br />

will form a<br />

covalent bond<br />

• Disulfide bond,<br />

disulfide bridge,<br />

or dicysteine<br />

bond.<br />

• Significantly<br />

stabilizes tertiary<br />

structure.<br />

Disulfide Bonds<br />

Determining <strong>Protein</strong> <strong>Structure</strong><br />

• There are O(100,000) distinct proteins in human<br />

proteome.<br />

• Two methods for revealing positions of atoms in 3-D: 3<br />

• X-Ray Crystallography<br />

• X-ray diffraction pattern + mathematical construction<br />

• Good protein crystal needed, good resolution of diffraction needed<br />

ed<br />

• Nuclear Magnetic Resonance<br />

• Small proteins only (< 250 residues)<br />

• Inter-proton distances + geometric constraints<br />

• 24,000+ 3D structures have been determined so far<br />

(including duplicates with different lig<strong>and</strong> bound, etc.)<br />

Krane & Raymer<br />

PDB Holdings List: 23-Mar<br />

Mar-2004<br />

PDB: Growth (Mar 2004)<br />

X-ray<br />

Diffraction<br />

<strong>and</strong> other<br />

NMR<br />

<strong>Protein</strong>s,<br />

Peptides,<br />

& Viruses<br />

19427<br />

2998<br />

<strong>Protein</strong>/Nucleic<br />

Acid Complexes<br />

944<br />

97<br />

Nucleic<br />

Acids<br />

726<br />

575<br />

Carbohydrates<br />

14<br />

4<br />

total<br />

21111<br />

3674<br />

Total<br />

22425<br />

1041<br />

1301<br />

18<br />

24785<br />

Please note that theoretical models have been removed.<br />

4


Experimentation to grow<br />

protein crystals<br />

X-Ray Crystallography<br />

Trial-<strong>and</strong>-error<br />

Experimentation<br />

Observables<br />

Partial Success<br />

~0.5mm<br />

Trial 1<br />

Control<br />

Parameters<br />

Failure<br />

Success<br />

Trial 3<br />

Trial 2<br />

• The crystal is a mosaic of millions of copies<br />

of the protein.<br />

• As much as 70% is solvent (water)!<br />

• May take months (<strong>and</strong> a “green” thumb) to<br />

grow.<br />

Krane & Raymer<br />

Krane & Raymer<br />

X-Ray diffraction<br />

• Image is averaged over:<br />

• Space (many copies)<br />

• Time (of the diffraction experiment)<br />

<strong>Protein</strong> Crystal Growth in Space<br />

• <strong>Protein</strong> crystal growth experiments on over 20 shuttle<br />

missions since 1984<br />

• Larger Crystals in 45.4% of the cases<br />

• New Crystal <strong>Structure</strong>s in 18% of the cases<br />

• ≥ 10% increase in X-Ray X<br />

Crystallography Brightness in 58%<br />

of the cases<br />

• Less thermal motion in 27.2% of the cases<br />

• An X-Ray X<br />

Crystallography resolution improvement of ~0.3 Å<br />

in 42.4% of the cases<br />

• An X-Ray X<br />

Crystallography resolution improvement of 0.3 to<br />

0.5 Å in 9.9% of the cases<br />

• An X-Ray X<br />

Crystallography resolution improvement of 0.5 to<br />

1.0 Å in 9.9% of the cases<br />

Krane & Raymer<br />

The <strong>Protein</strong> Folding Problem<br />

• <strong>Protein</strong>s self-assemble in solution. Almost all of the<br />

information necessary to determine the complex 3-D 3<br />

structure is in the amino <strong>acid</strong> sequences<br />

• Central dogma:<br />

Sequence specifies structure<br />

• Central question:<br />

“Given a particular sequence of<br />

amino <strong>acid</strong> residues (primary structure), what will the<br />

tertiary/quaternary structure of the resulting protein be?”<br />

Levinthal’s paradox<br />

• Consider a 100 residue protein. If each residue<br />

can take only 3 positions, there are 3 100 = 5 ×<br />

10 47 possible conformations.<br />

• If it takes 10 -13<br />

s to convert from 1 structure to<br />

another, exhaustive search would take 1.6 × 10 27<br />

years!<br />

5


Ideas on protein folding<br />

• It is believed that hydrophobic collapse is a key<br />

driving force for protein folding<br />

• Hydrophobic core!<br />

• <strong>Protein</strong>s are, in fact, only marginally stable<br />

• Native state is typically only 5 to 10 kcal/mole more<br />

stable than the unfolded form<br />

• Many proteins help in folding<br />

• <strong>Protein</strong> disulfide isomerase – catalyzes shuffling of<br />

disulfide bonds<br />

• Chaperones – break up aggregates <strong>and</strong> (in theory)<br />

unfold misfolded proteins<br />

What determines fold?<br />

• Anfinsen’s experiments in 1957 demonstrated<br />

that proteins can fold spontaneously into their<br />

native conformations under physiological<br />

conditions. This implies that primary structure<br />

does indeed determine folding or 3-D 3 D structure.<br />

• Some exceptions exist<br />

• Chaperone proteins assist folding<br />

• Abnormally folded Prion proteins can catalyze<br />

misfolding of normal prion proteins that then<br />

aggregate<br />

Other factors<br />

• Physical properties of protein that influence<br />

stability & therefore, determine its fold:<br />

• Rigidity of backbone<br />

• <strong>Amino</strong> <strong>acid</strong> interaction with water<br />

• Hydropathy index for side chains<br />

• Interactions among amino <strong>acid</strong>s<br />

• Electrostatic interactions<br />

• Hydrogen, disulphide bonds<br />

• Volume constraints<br />

CASP changed the<br />

l<strong>and</strong>scape<br />

• Critical Assessment of <strong>Structure</strong> <strong>Prediction</strong> competition.<br />

Even numbered years since 1994<br />

• Solved, but unpublished structures are posted in May,<br />

predictions due in September<br />

• Various categories<br />

• Relation to existing structures, ab initio, , homology, fold, etc.<br />

• Partial vs. Fully automated approaches<br />

• Produces lots of information about what aspects of the<br />

problems are hard, <strong>and</strong> ends arguments about test sets.<br />

• Results showing steady improvement, <strong>and</strong> the value of<br />

integrative approaches.<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!