04.04.2013 Views

Thesis Title: Subtitle - NMR Spectroscopy Research Group

Thesis Title: Subtitle - NMR Spectroscopy Research Group

Thesis Title: Subtitle - NMR Spectroscopy Research Group

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Computational study of proteins with paramagnetic <strong>NMR</strong>: Automatic<br />

assignments of spectral resonances, determination of protein-protein and<br />

protein-ligand complexes, and structure determination of proteins<br />

Christophe Schmitz<br />

A thesis submitted for the degree of Doctor of Philosophy at<br />

The University of Queensland in December 2009<br />

School of Chemistry and Molecular Biosciences


ii<br />

Declaration by author<br />

This thesis is composed of my original work, and contains no material previously published<br />

or written by another person except where due reference has been made in the text. I have clearly<br />

stated the contribution by others to jointly-authored works that I have included in my thesis.<br />

I have clearly stated the contribution of others to my thesis as a whole, including statistical<br />

assistance, survey design, data analysis, significant technical procedures, professional editorial<br />

advice, and any other original research work used or reported in my thesis. The content of my<br />

thesis is the result of work I have carried out since the commencement of my research higher<br />

degree candidature and does not include a substantial part of work that has been submitted to<br />

qualify for the award of any other degree or diploma in any university or other tertiary institution. I<br />

have clearly stated which parts of my thesis, if any, have been submitted to qualify for another<br />

award.<br />

I acknowledge that an electronic copy of my thesis must be lodged with the University<br />

Library and, subject to the General Award Rules of The University of Queensland, immediately<br />

made available for research and study in accordance with the Copyright Act 1968.<br />

I acknowledge that copyright of all material contained in my thesis resides with the<br />

copyright holder(s) of that material.<br />

Statement of Contributions to Jointly Authored Works Contained<br />

in the <strong>Thesis</strong><br />

John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />

stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />

Soc 129:13749-13757.<br />

John designed new <strong>NMR</strong> experiments, recorded and assigned the spectra, and wrote the<br />

corresponding paragraphs in the paper. Schmitz designed and implemented the software to<br />

automate the assignment procedure, ran the calculations, and wrote the paragraphs ―The Program<br />

Possum‖ and the ―Automatic Assignments without EXSY Data‖. Park made the protein samples.<br />

Dixon coordinated the protein sample preparation and corrected aspects of the paper. Huber was<br />

responsible for the computational aspects of the project and the writing of corresponding sections<br />

of the paper, Otting coordinated the overall project and was responsible for the writing of the paper.


iii<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />

Biomol <strong>NMR</strong> 41:179-189<br />

Schmitz designed and implemented the software to automate the determination of the Δχ-<br />

tensor and wrote most of the paper except for the ―study case‖ section. Stanton-Cook was<br />

responsible of the calculation, the protein-protein modelling, the writing of the ―study case‖ and<br />

improvements of the paper. Su was responsible for improving the design of the software from an<br />

―end-user‖ perspective. Otting and Huber were responsible for the overall project and the writing of<br />

the manuscript.<br />

Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />

pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted.<br />

Schmitz designed and implemented the PCS-score into the software ROSETTA, collected<br />

experimental data sets, performed computations, and was responsible for writing the manuscript.<br />

Vernon guided the implementation of the PCS-score, ran calculations, interpreted the results, and<br />

improved the manuscript. Otting gathered experimental data sets, set up the overall project, and<br />

corrected versions of the manuscript. Baker was responsible for guiding the overall project, and for<br />

the overall manuscript. Huber designed the PCS-score, guided the overall project, and improved the<br />

paper.<br />

Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole<br />

No contributions by others.<br />

Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the<br />

Award of Another Degree<br />

None.<br />

Published Works by the Author Incorporated into the <strong>Thesis</strong><br />

John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />

stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />

Soc 129:13749-13757. Incorporated as Chapter 2.


iv<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />

Biomol <strong>NMR</strong> 41:179-189. Incorporated as Chapter 3.<br />

Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />

pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted. Incorporated<br />

as Chapter 4.<br />

Additional Published Works by the Author Relevant to the <strong>Thesis</strong><br />

but not Forming Part of it<br />

Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />

(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />

<strong>NMR</strong> spectroscopy. J Am Chem Soc 130:10486-10487


v<br />

Acknowledgements<br />

I would like to thank my advisors Dr Thomas and Prof. Gottfried for their scientific and<br />

moral support that made this thesis so enjoyable. In particular, I really appreciated their door<br />

constantly opened for discussions; their communicative scientific enthusiasm; their exemplary - if<br />

not legendary - efficiency; their honesty and encouragement in whatever I tried to accomplish, and<br />

of course their constant and reliable good humor.<br />

Thanks to the people who put me on track to postgraduate studies, in particular Denis<br />

Barthou for his amazing teaching, Philippe Pucheral and Luc Bouganim for introducing me to<br />

research, and of course Prof. Guido for somehow bringing me into the field of structural biology.<br />

Thanks to the past and present members / visitors of the BMMG / MD group for those 3.5<br />

years of fun, I appreciated their company whether it was for a discussion, ―une boue‖, a tea break, a<br />

beer or two, or three, a meal, a game of tennis / squash, or a dance, so thanks to Matt, Itamar,<br />

David, Mitchel, Ying, Zrinka, Daniela, Michael, Kim, Liz, Alpesh, Prof. Alan Mark and all the<br />

others.<br />

It has always been a pleasure to visit ANU in Canberra thanks to Michael, Xun-Cheng,<br />

Hiromasa, Kiyoshi, Karin and Laura.<br />

I also would like to thank Prof. David Baker and his lab for welcoming me for a couple of<br />

months for a fructiferous collaboration, and many many thanks to Robert for so much help with<br />

that project, and for being a great illustration of how friendly Canadian people are.<br />

My apologizes to my family for being away so far for so long, I know you understood my<br />

decision. Thanks for all your support. Thank you Chantel for your patience, love, support and<br />

patience.


vi<br />

Abstract<br />

Understanding biological phenomena at atomic resolution is one of the keys to modern drug<br />

design. In particular, knowledge of 3D structures of proteins and their interactions with other<br />

macromolecules are necessary for designing chemical compounds that modify biological processes.<br />

Conventional methods for protein structure determinations comprise X-ray crystallography and<br />

nuclear magnetic resonance (<strong>NMR</strong>) spectroscopy. These techniques can also determine the binding<br />

mode of chemical compounds. Either technique can be slow and costly, making it highly relevant<br />

to explore alternative strategies. Paramagnetic <strong>NMR</strong> spectroscopy is emerging as such an<br />

alternative technique. In order to measure the paramagnetic effects, two <strong>NMR</strong> spectra are compared<br />

that have been measured with and without a bound paramagnetic metal ion. In particular,<br />

pseudocontact shifts (PCS) of nuclear spins are easily measured as the difference (in ppm) of the<br />

chemical shifts between the two spectra. PCSs provide long range and orientation dependent<br />

restraints, allowing positioning of the spin with respect to the magnetic susceptibility tensor<br />

anisotropy (Δχ-tensor) of the metal ion.<br />

In this thesis, I used the PCS effect to computationally extract information from <strong>NMR</strong><br />

spectra. I developed (i) a tool (called Possum) to automatically assign diamagnetic and<br />

paramagnetic spectra of the methyl groups of amino acid side chains, given structural information<br />

of the protein studied and prior knowledge of the Δχ-tensor; (ii) I designed a comprehensive<br />

software package (called Numbat) to extract Δχ-tensor parameters from assigned PCS values and<br />

the available 3D structure; and (iii) I incorporated PCS-based restraints into the protein structure<br />

prediction software CS-ROSETTA and demonstrated that this combination (PCS-ROSETTA)<br />

presents a significant improvement for de novo structure determination. The three projects serve<br />

different purposes at different stages of protein <strong>NMR</strong> studies. They could be combined in the<br />

following manner: Starting from assigned backbone PCSs, PCS-Rosetta could be used to determine<br />

the 3D structure of the protein. Possum can then be used to automatically assign the <strong>NMR</strong><br />

resonances of the methyl groups using PCSs. Finally, Numbat can be used to fit improved Δχ-<br />

tensors to all the PCS data, analyze the quality of the Δχ-tensors and identify possible wrong<br />

assignments. Iterative repetition of this protocol would give a 3D structural model of the protein<br />

with a minimum of data. Alternatively, the Δχ-tensor parameters and PCSs could be used as input<br />

for a traditional software package such as Xplor-NIH to compute a 3D structure of the protein.


vii<br />

Keywords<br />

paramagnetic nmr, pseudocontact shift, lanthanide, magnetic susceptibility tensor, protein,<br />

structure determination, resonance assignment, protein folding<br />

Australian and New Zealand Standard <strong>Research</strong> Classifications<br />

(ANZSRC)<br />

060112 (40%), 080301 (30%), 030406 (30%)


viii<br />

Table of Contents<br />

Declaration by author ......................................................................................................................... ii<br />

Statement of Contributions to Jointly Authored Works Contained in the <strong>Thesis</strong> .............................. ii<br />

Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole ....................................................... iii<br />

Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the Award of Another Degree ............... iii<br />

Published Works by the Author Incorporated into the <strong>Thesis</strong> ........................................................... iii<br />

Additional Published Works by the Author Relevant to the <strong>Thesis</strong> but not Forming Part of it ........ iv<br />

Acknowledgements .............................................................................................................................. v<br />

Abstract .............................................................................................................................................. vi<br />

Keywords .......................................................................................................................................... vii<br />

Australian and New Zealand Standard <strong>Research</strong> Classifications (ANZSRC) .................................. vii<br />

Table of Contents ............................................................................................................................. viii<br />

List of Figures .................................................................................................................................... xi<br />

List of Tables ................................................................................................................................... xiv<br />

List of Abbreviations ......................................................................................................................... xv<br />

1. Introduction ................................................................................................................................ 1<br />

1.1 Liquid State Nuclear Magnetic Resonance .......................................................................... 1<br />

1.2 Paramagnetic <strong>NMR</strong> .............................................................................................................. 4<br />

1.2.1 The four paramagnetic effects in <strong>NMR</strong> ........................................................................ 4<br />

1.2.2 The pseudocontact shift as a restraint ........................................................................... 7<br />

1.3 Computational study of paramagnetic proteins .................................................................. 12<br />

1.3.1 The assignment problem ............................................................................................. 12<br />

1.3.2 The Δχ-tensor determination problem ........................................................................ 16<br />

1.3.3 De novo structure determination of proteins .............................................................. 19<br />

1.4 Scope of the thesis .............................................................................................................. 23<br />

1.5 References .......................................................................................................................... 24<br />

2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls ...................... 27<br />

2.1 Abstract .............................................................................................................................. 28<br />

2.2 Introduction ........................................................................................................................ 28<br />

2.3 Experimental section .......................................................................................................... 30<br />

2.3.1 Sample preparation ..................................................................................................... 30<br />

2.3.2 <strong>NMR</strong> spectroscopy ..................................................................................................... 30


ix<br />

2.3.3 Manual resonance assignments from PCS ................................................................. 32<br />

2.3.4 The program Possum .................................................................................................. 32<br />

2.4 Results ................................................................................................................................ 36<br />

2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes ................................................. 36<br />

2.4.2 Methyl CZ-EXSY experiments ................................................................................... 38<br />

2.4.3 Resonance assignment of Met, Ala and Thr methyl groups ....................................... 39<br />

2.4.4 Assignments of Val, Leu, and Ile methyl groups ....................................................... 41<br />

2.4.5 Automatic assignments without EXSY data .............................................................. 44<br />

2.4.6 PCS and flexibility ..................................................................................................... 46<br />

2.5 Discussion .......................................................................................................................... 48<br />

2.6 Acknowledgement .............................................................................................................. 50<br />

2.7 Supporting Information Available ..................................................................................... 50<br />

2.8 References .......................................................................................................................... 51<br />

2.9 Supporting information ...................................................................................................... 56<br />

3. Numbat: new user-friendly method built for automatic Δχ-tensor determination ................... 75<br />

3.1 Abstract .............................................................................................................................. 76<br />

3.2 Keywords ........................................................................................................................... 76<br />

3.3 Abbreviations ..................................................................................................................... 76<br />

3.4 Introduction ........................................................................................................................ 77<br />

3.5 Algorithm ........................................................................................................................... 78<br />

3.6 Program Features ................................................................................................................ 80<br />

3.6.1 GUI ............................................................................................................................. 80<br />

3.6.2 Input files .................................................................................................................... 81<br />

3.6.3 Methyl group definition .............................................................................................. 81<br />

3.6.4 Optimization of the tensor parameters ....................................................................... 81<br />

3.6.5 Residual Anisotropic Chemical Shifts (RACS) ......................................................... 82<br />

3.6.6 Multiple PCS data sets ................................................................................................ 82<br />

3.6.7 PCS modification ........................................................................................................ 83<br />

3.6.8 PCS selection .............................................................................................................. 83<br />

3.6.9 Conventions ................................................................................................................ 83<br />

3.6.10 Error analysis ............................................................................................................ 84<br />

3.6.11 Visualization ............................................................................................................. 85<br />

3.6.12 Output ....................................................................................................................... 86<br />

3.7 Study case ........................................................................................................................... 86


x<br />

3.7.1 Subunit ε186 ............................................................................................................... 87<br />

3.7.2 Subunit θ ..................................................................................................................... 89<br />

3.7.3 Modelling the complex between ε186 and θ .............................................................. 90<br />

3.8 Conclusion .......................................................................................................................... 92<br />

3.9 Acknowledgment ............................................................................................................... 93<br />

3.10 References ........................................................................................................................ 93<br />

3.11 Supporting information .................................................................................................... 97<br />

4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA ..................... 101<br />

4.1 Abstract ............................................................................................................................ 102<br />

4.2 Introduction ...................................................................................................................... 102<br />

4.3 Results .............................................................................................................................. 104<br />

4.3.1 Test set ...................................................................................................................... 104<br />

4.3.2 Capacity of the PCS Score to Identify Native-like Structures ................................. 105<br />

4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA ............................................. 106<br />

4.3.4 Successes and Limits of PCS-ROSETTA Calculations ........................................... 109<br />

4.4 Discussion ........................................................................................................................ 110<br />

4.5 Materials and Methods ..................................................................................................... 112<br />

4.5.1 PCS-ROSETTA Score. ............................................................................................. 112<br />

4.5.2 PCS-ROSETTA Algorithm ...................................................................................... 113<br />

4.5.3 Input for PCS-ROSETTA ......................................................................................... 113<br />

4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination ............................... 114<br />

4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring ........................ 115<br />

4.6 Acknowledgments ............................................................................................................ 115<br />

4.7 References ........................................................................................................................ 115<br />

4.8 Supporting information .................................................................................................... 118<br />

5. Conclusion and perspectives .................................................................................................. 129<br />

5.1 The use of PCS for structure determination ..................................................................... 130<br />

5.1.1 Folding of proteins using only pseudocontact shifts ................................................ 130<br />

5.1.2 Uses of multiple lanthanide binding sites ................................................................. 132<br />

5.1.3 Development of a new PCS-ROSETTA protocol .................................................... 133<br />

5.2 The use of PCS for chemical shift assignment ................................................................. 134<br />

5.3 The use of PCS for protein docking ................................................................................. 135<br />

5.4 References ........................................................................................................................ 136


xi<br />

List of Figures<br />

Figure 1.1 <strong>NMR</strong> effects used for structure determination .................................................................. 3<br />

Figure 1.2 Representation of the distance and angular dependence of the four paramagnetic effects<br />

for the spin S, or system of spin S1-S2 (green) ...................................................................... 5<br />

Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />

spectra .................................................................................................................................... 5<br />

Figure 1.4 The PCS is less sensitive than RDC to small discrepancies between X-ray and solution<br />

structure ................................................................................................................................. 7<br />

Figure 1.4 The Δχ-tensor determination problem ............................................................................... 8<br />

Figure 1.5 Illustration of the three approaches of resonance assignment ........................................... 8<br />

Figure 1.6 Illustration of PCS restraints ........................................................................................... 11<br />

Figure 1.7 Protein complexes determined using PCSs ..................................................................... 11<br />

Figure 1.8 Flow-chart of the Echidna algorithm ............................................................................... 14<br />

Figure 1.9 Examples of the MAP problem ....................................................................................... 15<br />

Figure 1.10 Illustration of the task performed by the software Possum ........................................... 16<br />

Figure 1.11 Isosurface shapes calculated by equation (1.1) ............................................................. 16<br />

Figure 1.12 Sanson-Flamsteed projection for visualization of Δχ-tensor uncertainty ...................... 17<br />

Figure 1.13 Sanson-Flamsteed representations of Δχ-tensor axes orientation ................................. 19<br />

Figure 1.14 Illustration of the task performed by the software Numbat ........................................... 19<br />

Figure 1.15 Effect of the mobility of the tag on the PCS ................................................................. 22<br />

Figure 2.1 Methyl CZ-EXSY experiments ........................................................................................ 31<br />

Figure 2.2 Formulation of the assignment problem depending on the information available .......... 35<br />

Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex (containing<br />

13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a) La 3+ /Dy 3+<br />

and (b) La 3+ /Yb 3+ (red) ........................................................................................................ 37<br />

Figure 2.4 Assignment of Met CH3 from PCS ................................................................................ 39<br />

Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />

stereospecific resonance assignments .................................................................................. 42<br />

Figure 2.6 Residues showing deviations between predicted and experimental PCS ........................ 47<br />

Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment used in this study ................. 56<br />

Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C .................................................................... 57


xii<br />

Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />

186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled<br />

glucose ................................................................................................................................. 58<br />

Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

containing 13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-<br />

TOCSY spectrum of the same sample (red) ........................................................................ 59<br />

Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />

methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile ............................. 60<br />

Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />

the cz- 186/ /Yb 3+ complex. ............................................................................................... 62<br />

Figure 3.1 Screenshots of Numbat main windows ........................................................................... 80<br />

Figure 3.2 Euler angle definitions used by Numbat ......................................................................... 84<br />

Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of its<br />

orientational uncertainty in a Sanson-Flamsteed projection plot ........................................ 85<br />

Figure 3.4 The four degenerate solutions arising from the symmetry of the Δχ-tensor around the x,<br />

y and z axes ......................................................................................................................... 92<br />

Figure 3.5 The complex between ε186 and θ determined by superimposition of Δχ-tensors .......... 92<br />

Figure 4.1 Fold identification by pseudocontact shifts ................................................................... 106<br />

Figure 4.2 Improved conformational sampling by PCS-ROSETTA .............................................. 108<br />

Figure 4.3 Energy landscapes generated by PCS-ROSETTA ........................................................ 108<br />

Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />

structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures<br />

(red) ................................................................................................................................... 110<br />

Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy ................... 119<br />

Figure S4.2 Improved fragment assembly by PCS-ROSETTA ..................................................... 120<br />

Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />

ROSETTA energies and C α rmsd values being calculated using only the core residues as<br />

defined in Table S4.1 ......................................................................................................... 121<br />

Figure S4.4 Identification of successful calculations with PCS-ROSETTA .................................. 122<br />

Figure S4.5 Flow diagram of PCS-ROSETTA ............................................................................... 123<br />

Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA .. 124<br />

Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the protein ....................... 131<br />

Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide fragments<br />

in the protein structure ....................................................................................................... 131


xiii


xiv<br />

List of Tables<br />

Table 2.1 Automatic assignment of methyl groups by the program Possum a ................................. 45<br />

Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />

complexes used in this study a ............................................................................................. 64<br />

Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />

using the program Possum a ................................................................................................ 69<br />

Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />

using the program Possum with methyl connectivity information in the Yb 3+ complex a .. 71<br />

Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />

of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+<br />

complex a ............................................................................................................................. 73<br />

Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule .................. 87<br />

Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a ............................... 89<br />

Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a 97<br />

Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />

conformers b of the <strong>NMR</strong> structure of . ............................................................................. 99<br />

Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA ....................... 104<br />

Table S4.1 PCS data information and grid search parameters used. .............................................. 125<br />

Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA. .................... 126


xv<br />

List of Abbreviations<br />

α Subunit α of the E. coli polymerase III<br />

ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />

θ Subunit θ of the E. coli polymerase III<br />

CCR Cross Correlated Relaxation<br />

CSA Chemical Shielding Anisotropy<br />

GUI Graphical User Interface<br />

HOT The bacteriophage P1-encoded homolog of θ<br />

MAP Multi-dimensional Assignment Problem<br />

<strong>NMR</strong> Nuclear Magnetic Resonances<br />

NOE Nuclear Overhauser Effect<br />

PCS Pseudocontact Shift<br />

ppm parts per million<br />

PRE Paramagnetic Relaxation Enhancement<br />

RACS Residual Anisotropic Chemical Shift<br />

RDC Residual Dipolar Coupling<br />

RMSD Root Mean Square Deviation<br />

UTR Unique Δχ-Tensor Representation


Chapter 1<br />

Introduction<br />

1. Introduction<br />

1.1 Liquid State Nuclear Magnetic Resonance<br />

In the last few decades Nuclear Magnetic Resonance (<strong>NMR</strong>) has been used routinely to<br />

investigate chemical compounds, proteins and complexes. The method relies on intrinsic spin<br />

properties of nuclei. Spins are first exposed to a strong and constant magnetic field delivered by the<br />

spectrometer. Then, they are excited by a radiofrequency pulse sequence. The precession of the<br />

spins is recorded during the free induction decay of the <strong>NMR</strong> experiment and converted into a<br />

frequency spectrum after Fourier transformation. Several parameters can be read from the spectra<br />

which provide information about the structure of the molecule.<br />

The chemical shift describes the dependence of nuclear magnetic energy levels on the<br />

electronic environment in a molecule. The chemical shift depends on the nature of the nucleus. It<br />

also strongly depends on its local neighborhood (up to 5 Å) due to the influence of the electron<br />

configuration. Hence, in a protein, almost all nuclei have different chemical shifts. This allows to<br />

distinguish them in the <strong>NMR</strong> spectrum by their specific resonance frequency.<br />

The dipole-dipole coupling is the direct magnetic interaction between two close spins. The<br />

effect is intra- and intermolecular, since it acts through space. The interaction energy is minimal<br />

when the two spins are aligned. This spin interaction is responsible for the Nuclear Overhauser<br />

Effect (NOE).


2 Chapter 1. Introduction.<br />

The scalar J-coupling results from an indirect magnetic interaction of two nuclear spins<br />

via their surrounding electrons. The effect is exclusively intramolecular because it is propagated<br />

through the bonds between two nuclei. Typically, it can be measured for nuclei separated by up to<br />

three bonds. In this case, it is referred as 3 J coupling. The 3 J-coupling constant yields angle<br />

information, as shown in Figure 1.1.b.<br />

<strong>NMR</strong> experiments measure the effects described above. One can classify <strong>NMR</strong> experiments<br />

in two groups: Those that yield structural information, and those that yield information to facilitate<br />

resonance assignment. The structural information comes mainly from direct dipole-dipole<br />

couplings providing short-range distance restraints between two spins, and from 3 J-couplings which<br />

offer dihedral angle restraints between the three bonds concerned. Some examples of <strong>NMR</strong><br />

experiments that offer structural information are:<br />

1D proton experiment: It provides the chemical shifts of the protons. Each 1 H has a<br />

different chemical shift, and each corresponding signal may be split into multiplets due to scalar<br />

couplings. The spectrum gets more complex as the number of spins increases. To reduce spectral<br />

overlap, two- or multi-dimensional <strong>NMR</strong> spectra can be recorded.<br />

1D carbon experiment is equivalent to the 1D proton experiment, but measured on carbon.<br />

Only 1% of natural carbon is 13 C and often the protein has to be 13 C labeled in order to observe<br />

carbon chemical shifts because the natural isotope of carbon ( 12 C) has no nuclear spin.<br />

NOESY: This experiment correlates spins that are separated in space by a distance of up to<br />

6 Å. The NOE observed in the NOESY experiment is based on the direct dipole-dipole coupling<br />

and provides valuable inter-spin distance information for the structure determination of proteins.<br />

NOE restraints are measured from the peak intensity in the NOESY experiment and provide<br />

distance information (Figure 1.1.a). As the effect is through-space and independent of chemical<br />

bonds, it is also useful for investigations of protein-ligand and protein-protein interactions.<br />

A major task and challenge of structure determination is to assign resonances to their<br />

corresponding atoms in order to apply experimental restraints to the correct set of atoms.<br />

Additional correlation <strong>NMR</strong> experiments are routinely recorded to assist in the chemical shift<br />

assignment task. These include:<br />

2D 15 N-HSQC experiments correlate protons with nitrogens of a 15 N labeled protein. These<br />

correlations simplify the analysis of a 1D spectrum since the additional dimension allows the


1.1 Liquid State Nuclear Magnetic Resonance. 3<br />

separation of the resonances into cross-peaks observed in a 2D plane. The recording time of the<br />

experiment is however longer.<br />

3D 13 C- 15 N-correlation experiments are 3-dimensional heteronuclear experiments that<br />

correlate C, N and H atoms. The resulting spectrum is in particular beneficial for large proteins,<br />

because resonance overlap is reduced in 3D. The disadvantage of these experiments is, however,<br />

that they are less sensitive than the 15 N-HSQC experiment and usually require that the protein is 13 C<br />

and 15 N double labeled.<br />

COSY experiments correlate 1 H spins via scalar couplings. They are used to identify groups<br />

of spins connected by less than four bonds (spin systems).<br />

TOCSY experiments correlate 1 H resonances that belong to the same spin-system, where<br />

pairs of spins are separated by no more than three bonds. TOCSY spectra include the COSY<br />

information, and are used to identify connected spin systems. TOCSY spectra are useful for<br />

sequential resonance assignment.<br />

NOESY experiments can also be used in the assignment procedure. COSY and TOCSY<br />

experiment should provide the amino acid type information of the resonances, whereas the NOESY<br />

experiment allows the sequential piecing together of the assignment by exploiting the distance<br />

dependency of the NOE effect.<br />

Figure 1.1 <strong>NMR</strong> effects used for structure determination. (a) The NOE effect<br />

provides distance information (up to 0.6 nm) between two protons. The intensity of<br />

the NOE signal is proportional to 1/d 6 , where d is the interproton distance. (b) The<br />

3 J-coupling gives dihedral angle restraints. The relationship between the angle Φ<br />

and the 3 J-coupling is given by the Karplus equation (Karplus, 1959) and the<br />

allowed values for Φ is illustrated by the plot 3 J = f(Φ). Figures adapted from web<br />

resources.


4 Chapter 1. Introduction.<br />

In contrast to NOE or 3 J coupling effects that are short-range (measureable for distance<br />

below 6 Å) and local (each measurement concerns an independent group of atoms), paramagnetic<br />

<strong>NMR</strong> introduces new effects that are long-range (measured up to 40 Å), and global (i.e. their effect<br />

is described for all spins in a common frame centered on the paramagnetic lanthanide).<br />

1.2 Paramagnetic <strong>NMR</strong><br />

1.2.1 The four paramagnetic effects in <strong>NMR</strong><br />

When a paramagnetic centre with unpaired electrons, such as a lanthanide ion, is present in<br />

a protein, the observed <strong>NMR</strong> spectrum changes due to induced paramagnetic effects. By<br />

comparison of the diamagnetic and paramagnetic spectra, one can observe the following four<br />

paramagnetic effects:<br />

The pseudocontact shift (PCS): It is given by equation (1.1), where the spin of interest is<br />

described by its polar coordinate in an internal frame (the Δχ-tensor frame) centered on the<br />

paramagnetic center (Figure 1.2.a).<br />

(1.1)<br />

Δχax = χz – (χz + χy)/2 and Δχrh = (χx - χy) are respectively the axial and rhombic component<br />

that describe the anisotropic effect; r, θ and θ are the polar coordinate of the spin in the Δχ-tensor<br />

frame (Figure 1.2.a). The PCS is a long range effect (up to 40 Å) which decays with 1/r 3 , and is<br />

measured as the difference between the paramagnetic and diamagnetic chemical shift (Figure 1.3).<br />

The residual dipolar coupling (RDC): With an attached paramagnetic lanthanide, a<br />

protein weakly aligns with respect to the magnetic field. RDCs are manifested as increases or<br />

decreases of the magnitudes of multiplet splittings that can be observed in undecoupled <strong>NMR</strong><br />

spectra (Figure 1.3). The RDC can also be back-calculated (equation (1.2)) provided that the<br />

orientation of the two spins with respect to the alignment tensor is known (Figure 1.2.b).<br />

with:<br />

(1.2)


1.2 Paramagnetic <strong>NMR</strong>. 5<br />

(1.3)<br />

B0 is the magnetic field, γH and γN are proton and nitrogen magnetogyric ratios, ћ Planck’s<br />

constant divided by 2π, S the order parameter of the molecular alignment, rNH the N-H distance, kB<br />

the Boltzmann constant, and T the absolute temperature.<br />

Figure 1.2 Representation of the distance and angular dependence of the four<br />

paramagnetic effects for the spin S, or system of spin S1-S2 (green). (a) PCSs and (b)<br />

RDCs are described in the χ-tensor frame centered on the lanthanide l (red). (c)<br />

PREs only yield distance dependence while (d) CCRs also yield angle dependence.<br />

Adapted from (Pintacuda et al., 2004).<br />

Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />

spectra. The figure shows the diamagnetic and paramagnetic antiphase doublets. PCS is measured


6 Chapter 1. Introduction.<br />

as the chemical shift difference. RDC is measured as the difference in line splitting. PRE and CCR<br />

can be determined from the differential line broadening.<br />

The paramagnetic relaxation enhancement (PRE): The PRE yields distance information<br />

between the paramagnetic lanthanide and the spin of interest (Figure 1.2.c). It depends on the<br />

distance r between the paramagnetic center and the nuclear spin with 1/r 6 (equation (1.4)) and<br />

accounts for the difference of line broadening between the paramagnetic and diamagnetic chemical<br />

shift (Figure 1.3).<br />

with:<br />

(1.4)<br />

(1.5)<br />

where ηr is the rotational correlation time, ωH the Larmor frequency of the proton, μ0 the<br />

vacuum permeability, gJ the g-factor, μB the Bohr magneton, and J the total spin moment.<br />

The cross correlated relaxation (CCR): This effect is also measured by the observed line<br />

broadening; more precisely by comparing the width between the two components of the antiphase<br />

doublet (Figure 1.3). This effect combines distance and angle dependence (equation (1.6) and<br />

Figure 1.2.d).<br />

with:<br />

(1.6)<br />

(1.7)<br />

All four paramagnetic effects can be used to study protein structure. Residual dipolar<br />

coupling has been widely used to help determining protein structures (Rohl et al., 2002), but when<br />

carefully comparing RDCs measured by <strong>NMR</strong> with RDCs predicted from a crystal structure, it was<br />

observed that small discrepancies between the N-H bond orientation in crystal and liquid state can


1.2 Paramagnetic <strong>NMR</strong>. 7<br />

lead to large deviations between measurement and prediction. PCSs are less sensitive to the<br />

difference between the crystal model and the solution structure. The focus in my PhD has been on<br />

using PCS to study proteins and their interactions.<br />

Figure 1.4 The PCS is less sensitive than RDC to small discrepancies<br />

between X-ray and solution structure. A large change in the orientation<br />

of the N-H vector will considerably affect the calculation of the RDC<br />

(equation (1.2) and Figure 1.2.b). On the other hand, the PCS will be<br />

less affected as the relative position of the hydrogen with respect to the<br />

tensor frame is almost unchanged when PCS are measurable (d > 10 Å).<br />

1.2.2 The pseudocontact shift as a restraint<br />

PCSs can be calculated using equation (1.1) if the magnitude (two parameters: Δχax and<br />

Δχrh, Figure 1.5.a), location (three Cartesian coordinates x, y and z, Figure 1.5.b) and orientation<br />

(three Euler angles α, β and γ, Figure 1.5.c) of the Δχ-tensor are known, and if a structure is<br />

available. The paramagnetic chemical shift of a spin located close to the paramagnetic center is<br />

broadened beyond detection due to PRE, and consequently, its PCS cannot be observed. The cutoff<br />

radius is typically about 10 Å.<br />

The <strong>NMR</strong> resonances have first to be assigned in order to measure PCSs. Three kinds of<br />

assignment have to be distinguished (Figure 1.6):<br />

(i) The assignment of the diamagnetic <strong>NMR</strong> spectrum: it is routinely performed with<br />

conventional sequential assignment methods using one or a combination of COSY,<br />

TOCSY, NOESY <strong>NMR</strong> and triple resonance experiments.


8 Chapter 1. Introduction.<br />

Figure 1.5 The Δχ-tensor determination problem. The Δχ-tensor can be conveniently represented by<br />

isosurfaces. For a given ppm value p, all spins that have a PCS value equal to p would be located<br />

on a given isosurface (red for negative PCS, blue for positive PCS). (a) The Δχax and Δχrh<br />

parameters that have to be determined are responsible for the shape and size of the isosurfaces. (b)<br />

The location of the paramagnetic center is described by three Cartesian coordinates. (c) The three<br />

Euler angles α, β and γ relate the orientation of the Δχ-tensor to the protein frame.<br />

Figure 1.6 Illustration of the three approaches of resonance assignment. (a) The peaks of the<br />

diamagnetic spectrum (blue) are assigned to their corresponding amino acids. (b) The<br />

paramagnetic cross peaks (red) are assigned to their corresponding residues. (c) When the pairing


1.2 Paramagnetic <strong>NMR</strong>. 9<br />

between diamagnetic and paramagnetic cross peak is already determined by a transfer experiment,<br />

the pairs of cross peaks can be assigned to their corresponding residues.<br />

(ii) The assignment of the paramagnetic <strong>NMR</strong> spectrum: sequential approaches are not<br />

suitable because the lanthanide induces PRE effects resulting in large line<br />

broadening for residues close to the paramagnetic center. Proposed experimental<br />

approaches use temperature dependence (Nguyen et al., 1999), magnetic field<br />

dependence (Bertini et al., 1998) or fast / slow exchange of the lanthanide (John et<br />

al., 2007) to transfer the chemical shift from the diamagnetic to the paramagnetic<br />

state.<br />

(iii) The assignment of the pseudocontact shift: One can pair the chemical shifts of<br />

diamagnetic and paramagnetic resonances when a transfer experiment can be<br />

performed, but the assignment to the individual atoms is still unknown. There is no<br />

direct method to assign experimental PCSs to the structure; the only way is to<br />

compare experimental and predicted PCSs to find the best match.<br />

Comparison between a calculated and measured PCS provides a restraint (Figure 1.7) that<br />

has been used for different purposes in the literature:<br />

Structure refinement: Allegrozzi et al. (Allegrozzi et al., 2000) showed how using PCS as<br />

restraints in structure refinement improved the quality of structures of calbindin. They compared<br />

the original <strong>NMR</strong> structures obtained from 1539 NOEs with structures refined using additional<br />

PCS restraints from three different lanthanides. The magnitude of the paramagnetic dipole moment<br />

differs between different paramagnetic metal ions. Hence, the cutoff radius and the distance for<br />

which PCSs can still be observed is lanthanide-dependent. The three lanthanides chosen in this<br />

study (Ce 3+ , Yb 3+ and Dy 3+ ) cover different regions of the 3D space in shells of 5-15 Å for Ce 3+ , 9-<br />

25 Å for Yb 3+ , and 13-40 Å for Dy 3+ . Each lanthanide used focuses on a different region of the<br />

protein, and provides additional independent information to the NOEs. The resulting ensemble of<br />

structures generated with each PCS data set separately shows better definition of the backbone in<br />

the area covered by the lanthanide used. In particular, the residues 56-59 have an RMSD above 1.5<br />

Å in the family of <strong>NMR</strong> structures. Inclusion of PCS restraints decreases the RMSD value to 0.75<br />

Å. What this study failed to show is the improvement in structure quality when all data (from all<br />

different lanthanides) are used simultaneously in the refinement procedure.


10 Chapter 1. Introduction.<br />

Protein-ligand interaction: John et al. (John et al., 2006) showed how PCS restraints can<br />

be used to determine the structure of protein-ligand complexes. The approach is of major interest<br />

for drug screening. In a first step, the Δχ-tensor is determined for the target protein. This is the most<br />

complicated task because the protein can be large, making the assignment of chemical shifts<br />

difficult. However, in the context of drug screening this step needs to be performed only once. Each<br />

ligand that is being screened is isotopically labeled, and a paramagnetic spectrum of the ligand in<br />

complex with the protein is recorded. The assignment of a ligand spectrum and the corresponding<br />

Δχ-tensor are swiftly and easily obtained. The two Δχ-tensors from ligand and protein being the<br />

same by definition, a simple superimposition of them leads to the rigid body structure of the<br />

complex. A molecular dynamics package is further used to locally refine the conformation of the<br />

protein on the contact surface. John and coworkers demonstrated the approach with the thymidine<br />

nucleotide as the ligand binding to the ε subunit of DNA polymerase III. The determined thymidine<br />

structure was found to have a very similar binding mode compared to the thymidine<br />

monophosphate present in the reference crystal structure.<br />

Protein-protein interaction: Pintacuda et al. (Pintacuda et al., 2006) described a protocol<br />

to solve the structure of a protein-protein complex using only PCSs and illustrated the protocol on<br />

the example of the N-terminal domain of the subunit ε and subunit θ of the E. coli DNA<br />

polymerase III. Again the method relies on the determination of the Δχ-tensors, first relative to one<br />

molecule (ε, Figure 1.8.a) and then relative to the second molecule (θ, Figure 1.8.b), followed by<br />

the superimposition of the Δχ-tensor frames (Figure 1.8.c). Such an approach is particularly<br />

relevant considering the difficulty to co-crystallize proteins in complexes, compared to the<br />

crystallization of the components separately. An alternative has been to use RDCs (McCoy et al.,<br />

2002), but the relative orientation and location of the two rigid bodies obtained with PCSs is more<br />

accurate compared to what would result from a complex build from RDC data, since PCSs yield<br />

simultaneously orientation and distance information, while RDC data lack distance information and<br />

are sensitive to small fluctuations of NH bond orientation. The resulting rigid-body docked<br />

complex could exhibit sterical clashes that would need to be resolved with a molecular refinement<br />

package. The final result is valuable as input for docking refinement software considering that the<br />

most difficult part of docking two molecules in a complex is to obtain with confidence the<br />

approximate binding sites.


1.2 Paramagnetic <strong>NMR</strong>. 11<br />

Figure 1.7 Illustration of PCS restraints. If the Δχ-tensor parameters are fully determined, one can<br />

accurately predict PCS values. The assignment of both diamagnetic and paramagnetic spectra<br />

provides experimental PCSs. Direct comparison of both offers a PCS-based restraint.<br />

Figure 1.8 Protein complexes determined using PCSs. Paramagnetic <strong>NMR</strong><br />

experiments are performed on the complex, (a) firstly with only the first protein<br />

labeled, (b) secondly with only the second protein labeled. The two Δχ-tensors are<br />

fitted separately, according to the experimental PCSs. The two Δχ-tensors are<br />

theoretically the same, their superimposition provides the structure of the complex<br />

(c). Adapted from (Pintacuda et al., 2006).


12 Chapter 1. Introduction.<br />

1.3 Computational study of paramagnetic proteins<br />

1.3.1 The assignment problem<br />

Assigning the resonances of <strong>NMR</strong> spectra is a necessary step towards applying <strong>NMR</strong><br />

restraints for protein computation. Although several software packages are capable of predicting<br />

chemical shifts, they require high-resolution 3D structures and lack accuracy especially for the<br />

unstructured parts of a protein (Shen et al., 2007). Their algorithms are not based on pure<br />

calculation from the 3D structure, but rather use statistical information extracted from the pdb data<br />

bank and from deposited chemical shifts.<br />

Pseudocontact shifts can be accurately predicted using equation (1.1). Consequently, it is<br />

possible to compare measured and calculated PCSs. The root mean square deviation between the<br />

calculated and the measured PCSs provides a score to minimize in order to yield the best possible<br />

assignment. This strategy has been applied to simultaneously assign measured PCSs and optimize<br />

the Δχ-tensor by a software package named ―Platypus‖ (Pintacuda et al., 2004). In this work, the<br />

protein was selectively labeled by residue type in order to simplify spectra. This led to the<br />

measurement of unassigned PCSs by unambiguous identification of the connectivity between a<br />

diamagnetic cross peak and its paramagnetic partner shifted along the diagonal in a 2D 15 N-HSQC<br />

spectrum. The diagonal shift is explained by the fact that in first approximation, hydrogen and<br />

nitrogen of an NH group have similar PCS values. This is due to the short distance between N and<br />

H atoms (approximately 1 Å) compared to the large distance (at least 10 Å) separating the NH bond<br />

from the lanthanide inducing the observed PCS. As a result, the polar coordinates within the Δχ-<br />

tensor frame are similar for the N and H spins and hence, both spins experience similar<br />

pseudocontact shifts. The second step of the protocol consists of combining a grid search over the<br />

Δχ-tensor parameters with an optimal assignment algorithm called the Hungarian method (Kuhn,<br />

1955): The grid search covers a large ensemble of possible combinations for the Δχ-tensor<br />

parameters. At each node of the grid search, it becomes possible to use the Hungarian method to<br />

obtain the optimal assignment in a polynomial time. A score can be calculated over each node to<br />

reflect the quality of the assignment, and compared to other nodes to extract the best assignment<br />

along with the best set of Δχ-tensor parameters.<br />

The ―diagonal rule‖ used in (Pintacuda et al., 2004) to manually measure PCSs has also<br />

been exploited in (Schmitz et al., 2006) to automatically assign paramagnetic chemical shifts of a<br />

full 15 N-HSQC spectrum, given a known 3D structure and the list of assigned diamagnetic


1.3 Computational study of paramagnetic proteins. 13<br />

resonances. The software Echidna was developed to overcome the difficulties of sequential<br />

assignment of cross peaks in a paramagnetic spectrum. It works as follow:<br />

Firstly, a small number n1 of paramagnetic peaks are paired with diamagnetic peaks by<br />

automatically screening unambiguous possibilities along the diagonal of a 2D 15 N-HSQC<br />

spectrum.<br />

Secondly, a Δχ-tensor is calculated to minimize the root mean square deviations between the n1<br />

experimental and calculated PCSs.<br />

Thirdly, the Δχ-tensor is used to predict for each diamagnetic cross peak the area of the<br />

spectrum where the paramagnetic partner is expected. This area is centered on the back-<br />

calculated PCS value, and defines a much smaller zone than the diagonal strip used in the first<br />

step. More paramagnetic peaks are unambiguously assigned.<br />

The two last steps are iterated until convergence. A final assignment is performed in order to<br />

yield the overall best assignment of all cross peaks after convergence of the method. This<br />

assignment uses the Hungarian method which finds in a polynomial time the optimal<br />

assignment among the n! possibilities, with n being the number of peaks to assign. A complete<br />

flow chart of the method is given in Figure 1.9.


14 Chapter 1. Introduction.<br />

Figure 1.9 Flow-chart of the Echidna algorithm.<br />

Both these two automatic <strong>NMR</strong> assignment techniques require partial initial assignments. In<br />

the case of Echidna, the whole diamagnetic spectrum needs to be assigned, while Platypus required<br />

the connectivity between the paramagnetic and the diamagnetic cross peak. In both cases, those<br />

prerequisites reduced the computational problem to a 2D Multi-dimensional Assignment Problem<br />

(MAP, Figure 1.10.a). A 2D-MAP is easily solved with the Hungarian method. More challenging<br />

and attractive would be to shortcut any initial manual assignment and start directly from the<br />

chemical shift lists of the diamagnetic and paramagnetic states. Such a method could be applied to<br />

automate the side chain assignment of a protein, once the preliminary and easier task of assigning<br />

the backbone chemical shifts and determining the Δχ-tensor has been done, for example with


1.3 Computational study of paramagnetic proteins. 15<br />

Echidna. Computationally, the problem becomes a 3D-MAP (Figure 1.10.b) with the number of<br />

possibilities increasing by (n!) 2 . This problem can no longer be solved in a polynomial time since a<br />

MAP problem of dimension larger or equal to three is proven to be NP-hard 1 (Karp, 1972). Instead,<br />

a heuristic method has to be used to find a good assignment, but without any guarantee of reaching<br />

the optimal assignment. An approach that tries to cope with the computational challenges (by<br />

means of additional experimental information) led to the development of a software dubbed<br />

Possum (Figure 1.11), which is described in Chapter 2.<br />

Figure 1.10 Examples of the MAP problem. (a) When calculated and predicted PCSs have to be<br />

matched, the cost function c(i, j) can be defined as the square deviation of the two values. The aim<br />

is to minimize the sum Q. The binary variables xi,j are defined to ensure that each element i and j is<br />

chosen exactly once. (b) The experimental PCSs are not directly available, but the paramagnetic<br />

and chemical shifts are measured. Their differences give possible experimental PCSs.<br />

1 The time required to solve a NP-Hard problem drastically increases with the size of the problem<br />

which is, in this context, the number n of residue to assign. Some problems with low n value (< 12)<br />

could be easily solved in a few minutes. The same problem, with just one extra peak to assign,<br />

remained unsolved after days of calculation. This illustrates that, independently of the algorithm<br />

used, NP-Hard problem becomes practically and suddenly insolvable for a given size n.which is, in<br />

this context, the number n of residue to assign. Some problems with low n value (< 12) could be<br />

easily solved in a few minutes. The same problem, with just one extra peak to assign, remained<br />

unsolved after days of calculation. This illustrates that, independently of the algorithm used, NP-<br />

Hard problem becomes practically and suddenly insolvable for a given size n.


16 Chapter 1. Introduction.<br />

Figure 1.11 Illustration of the task performed by the software Possum.<br />

The software package requires the Δχ-tensor parameters to perform a<br />

structure based automatic assignment of the <strong>NMR</strong> resonances.<br />

1.3.2 The Δχ-tensor determination problem<br />

The Δχ-tensor determination problem consists of obtaining the eight parameters<br />

characterizing the Δχ-tensor such that the discrepancy between the observed and calculated PCS is<br />

minimal. These comprise the determination of the paramagnetic center location, of the Δχ-tensor<br />

orientation, and of the axial and rhombic component. The last two parameters characterize the<br />

shape of PCS effects that ranges between two extremes as illustrated in Figure 1.12.<br />

Figure 1.12 Isosurface shapes calculated by equation (1.1). The positive isolevel is shown in blue,<br />

the negative is shown in red. Both isolevels have the same absolute value. (a) When the rhombic<br />

component is equal to zero, the isosurface is axially symmetric (dotted line). (b) Isosurface when


1.3 Computational study of paramagnetic proteins. 17<br />

the axial and rhombic components are equal. Three planar symmetries remain for the three planes<br />

orthogonal to the three main axes. (c) For an axial value of zero, the isosurface presents two<br />

additional planar symmetries due to an ambiguous main axis. While the axial and rhombic value<br />

“decide” the isosurface shape, equation (1.1) constrains any isosurfaces to be shaped between the<br />

two extremes shown in (a) and (c).<br />

Figure 1.13 Sanson-Flamsteed projection for visualization of Δχ-<br />

tensor uncertainty. The axes of the Δχ-tensor are projected on a 2D<br />

surface. The uncertainty of the axes orientation can be reflected by the<br />

size and shape of the colored area used for each axis, as done in<br />

Figure 1.14.<br />

The Δχ-tensor is similar to the alignment tensor used to compute RDCs. The alignment<br />

tensor parameters are easily determined by singular value decomposition, as equation (1.2) is linear<br />

with respect to the alignment tensor parameters. The axial and rhombic component of the alignment<br />

tensor can also be estimated without the requirement of protein coordinates, by exploiting the<br />

isotropic distribution of the NH bond orientation in space (Clore et al., 1998). The equation that<br />

governs the PCS is non-linear because no assumption of the isotropic distribution of PCS values<br />

can be made. Consequently, the way to determine the Δχ-tensor relies on a minimization of a cost<br />

comparing predicted and experimental PCS. A few existing computational software packages could<br />

be used to get the Δχ-tensor parameters, such as Fantasian (Banci et al., 1997), or Xplor-NIH<br />

(Schwieters et al., 2003, Schwieters et al., 2006). However, they can be cumbersome to use and<br />

have poor interactivity with the user.<br />

Another important feature that existing approaches fail to provide is the possibility to<br />

estimate the quality of the fit and directly visualize it in a Sanson-Flamsteed representation


18 Chapter 1. Introduction.<br />

(Bugayevskiy et al., 1995) as shown in Figure 1.13. A Sanson-Flamsteed plot is a projection of a<br />

sphere on a plane, and is commonly used in geography to project the globe on a map. The axes of<br />

the Δχ-tensor penetrate the surface of a unitary sphere, and the penetration points can be identified<br />

on the projection. This representation is convenient to highlight how reliable the fit of a Δχ-tensor<br />

is in the context of the data and structure used for the calculation. For example, the Δχ-tensor found<br />

for the ε subunit of DNA polymerase III is particularly well defined (Figure 1.14.a), while the<br />

Sanson-Flamsteed plot corresponding to the fit for the θ subunit (in complex with ε) reveals more<br />

uncertainties (Figure 1.14.b). Those differences are mainly due to the large distance (>15 Å) that<br />

separates θ from the lanthanide bound to ε.<br />

When docking a small molecule compound such as a drug to a protein, it is likely that the<br />

Sanson-Flamsteed plot highlights large uncertainties because only a small number of PCSs can be<br />

measured. To improve the situation, an important and desired feature would be to use the<br />

information of the protein’s Δχ-tensor in order to improve the fit of the drug’s Δχ-tensor. The<br />

resulting enhancements are illustrated in Figure 1.14.c (compared to Figure 1.14.b) for the ε/θ<br />

protein-protein complex and are expected to be similar for small ligand-protein complexes.<br />

In order to address the presented issues, it is required to have an efficient software package<br />

to work with the Δχ-tensor. Chapter 3 presents the software package ―Numbat‖ (Figure 1.15) that<br />

tries to meet all those needs.


1.3 Computational study of paramagnetic proteins. 19<br />

Figure 1.14 Sanson-Flamsteed representations of Δχ-tensor axes orientation. The error analysis<br />

used one thousand Monte-Carlo iterations that randomly selected 50% of the PCS data set. (a) The<br />

Δχ-tensors fitted for ε are very well defined. (b) As the lanthanide is 15 Å away from the θ subunit,<br />

the fitted Δχ-tensors are less accurate, as indicated by the large area each axis can spawn. (c)<br />

When keeping the relative orientation and magnitude of the two Δχ-tensors fixed (to the value<br />

determined for ε), the quality of the Δχ-tensor fitted increases, resulting in a more reliable complex<br />

of the two subunits. The well defined z-axis-area of the two Δχ-tensors (blue and brown) in (c)<br />

illustrates the reduced uncertainty around the z axes.<br />

Figure 1.15 Illustration of the task performed by the software<br />

Numbat. Given assigned PCSs, Numbat performs a structure<br />

based determination of the Δχ-tensor.<br />

1.3.3 De novo structure determination of proteins


20 Chapter 1. Introduction.<br />

The determination of protein structures is one of the main challenges of the post genomic<br />

era. The knowledge of structures at atomic detail is a prerequisite to understand how<br />

macromolecular complexes assemble and perform their tasks within living organisms. The<br />

established methods of X-ray crystallography and <strong>NMR</strong> spectroscopy still require significant<br />

human and financial resources to determine the structure of proteins of interest. Efforts are being<br />

focused on high-throughput methods to speed up the process of characterizing a large number of<br />

proteins (Kobe et al., 2008).<br />

De novo structure prediction software packages such as ROSETTA (Simons et al., 1997)<br />

are quite successful for small proteins (< 100 residues). The large size of the conformational space<br />

to explore makes it difficult, however, to tackle larger proteins. To overcome the ―sampling<br />

problem‖, one approach is to include additional experimental restraints that facilitate the three-<br />

dimensional reconstruction of protein structures. Those restraints must be easier to measure than it<br />

would be to obtain crystals of the target protein, or to measure and assign the NOEs required for the<br />

full determination of the structure.<br />

The pseudocontact shift effect is a candidate for this approach. PCSs can be measured<br />

swiftly and accurately as the chemical shift difference between two spectra, once a paramagnetic<br />

probe has been introduced into the protein. The use of lanthanide binding tags makes these<br />

techniques potentially available to any protein. Several lanthanide tags are now available. For a<br />

recent review, see (Su et al., 2009b). While it is not yet routine to attach lanthanide binding tags to<br />

a protein, several options are possible. Attachment by one or two disulfide bonds (Smith et al.,<br />

1975), attachment at one of the termini of the protein (Donaldson et al., 2001), or even use of a<br />

non-covalent tag as demonstrated by (Su et al., 2009a) can be considered. It is expected that<br />

lanthanide attachment techniques will become routine in the future.<br />

Beyond the process of attachment, the second challenge is to have a tag that is not flexible.<br />

The physical model underlying equation (1.1) is accurate if the Δχ-tensor parameters are constant<br />

over time. This hypothesis could be questioned if small movements of the tag occur. Fluctuation of<br />

the tag produces two undesired effects:<br />

(i) It changes the electronic environment in the vicinity of the lanthanide and<br />

consequently, the orientation or magnitude of the Δχ-tensor. As equation (1.1) is<br />

linear with respect to the axial component, the rhombic component, and the three<br />

Euler angles, changes over time of those five parameters will not affect the way<br />

PCSs are predicted. More precisely, n conformations of the Δχ-tensor occurring


1.3 Computational study of paramagnetic proteins. 21<br />

within the protein (and sharing the same center coordinate) can be equally explained<br />

by one single conformation. To demonstrate this, let’s take a spin i of measured<br />

pseeudocontact shift PCSi. PCSi is the sum of the contribution of the n states of the<br />

Δχ-tensor.<br />

For the state j, an alternative formula of the pseudocontact shift in a given frame f is:<br />

With<br />

(1.8)<br />

(1.9)<br />

(1.10)<br />

Where x, y, z are the Cartesian coordinates of the spin i in the frame f, and r the distance<br />

between the lanthanide and the spin i. Equation (1.8) becomes:<br />

With<br />

(1.11)<br />

(1.12)<br />

The traceless and symmetric matrix D contains the effective Δχ-tensor parameters that are<br />

necessary and sufficient to describe the PCS experienced by the spin i.


22 Chapter 1. Introduction.<br />

(ii) It could displace the position of the paramagnetic center with respect to the protein<br />

frame. The amplitude of those movements depends on the size and rigidity of the tag<br />

used. Small displacements have mostly small impact on the value of the PCS<br />

because PCSs are usually observable only more than ten Angstroms from the<br />

lanthanide. To assess this principle further, the comparison between a static metal<br />

ion and a mobile one following a realistic trajectory is illustrated in Figure 1.16.<br />

Figure 1.16 Effect of the mobility of the tag on the PCS. (a) Twelve tensors are being used to<br />

represent a realistic trajectory of the tag. They have random orientation, random axial and<br />

rhombic values, and are located three Angstroms away from the anchor point (black dot). The<br />

range of angles covers 110 degrees, in steps of 10 degrees. (b) The isosurfaces resulting from the<br />

ensemble of tensors in (a). The red surface represents the isolevel of 5.0 ppm, the blue one<br />

corresponds to -5.0 ppm. The shapes are distorted compared to the typical shape of an isosurface<br />

shown in Figure 1.12. (c) PCSs cannot be measured closer than 10 Angstroms from the<br />

paramagnetic center. The cutoff area is shown in grey. (d) Surfaces of isolevel 1.0 ppm and -1.0<br />

ppm are shown superimposed to (b). They exhibit a classical profile as seen in Figure 1.12. (e)<br />

The cutoff of 10 Angstroms is superimposed on figure (d).


1.4 Scope of the thesis. 23<br />

Once the PCSs have been measured, the next step is to use them appropriately in order to<br />

extract structural information. Using the PCSs to filter the correct structure(s) (by comparing<br />

calculated and experimental values) among a large ensemble of generated structures would not be<br />

enough. The PCS-based restraint needs to be directly incorporated into the process of structure<br />

generation to bias the outputted conformations towards the native one. Several options are to be<br />

considered: incorporating a PCS-restraint into a molecular dynamics software, a molecular<br />

refinement package that employs simulated annealing routines, or into a molecular fragment<br />

replacement software. The main question is which one of those approaches is the most suited to<br />

capture the global nature of the PCS effect.<br />

The merit of the PCS for de novo structure determination in the context of a molecular<br />

fragment replacement is described in Chapter 4. A PCS score function has been added to the<br />

package CS-ROSETTA (Shen et al., 2008). The ability of the PCS to drive CS-ROSETTA<br />

calculations towards the native conformation and to identify native like structures is discussed.<br />

1.4 Scope of the thesis<br />

This thesis covers different aspects of paramagnetic <strong>NMR</strong> from a computational point of<br />

view. This includes the use of PCSs for <strong>NMR</strong> resonance assignment, for Δχ-tensor determination in<br />

preparation of rigid body complex calculations, and de novo structure determinations of proteins.<br />

The rest of the thesis is organized as follow:<br />

In Chapter 2 is described an experimental and a computational approach to assign chemical<br />

shifts of methyl groups from the paramagnetic and diamagnetic <strong>NMR</strong> spectra. The computational<br />

route is supported by the development of the software Possum which was tested on artificial data<br />

first before being applied to experimental data.<br />

In Chapter 3 is presented a newly developed software that works specifically with<br />

pseudocontact shifts. The possibilities offered by the software are discussed and illustrated by the<br />

rapid reconstruction of the complex between the subunit ε and θ of the DNA polymerase III.<br />

In Chapter 4 is reported the incorporation of the PCS into the molecular fragment<br />

replacement software CS-ROSETTA, and the development of a new protocol to perform, for the<br />

first time, de novo protein structure determination using only PCSs and chemical shifts as<br />

experimental restraints.


24 Chapter 1. Introduction.<br />

Chapters 5 concludes this thesis by presenting some perspective of further development to<br />

better exploit PCS information in structural biology.<br />

1.5 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

angstrom from the metal ion. J Am Chem Soc 122:4154-4161<br />

Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />

HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />

dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />

29:68-76<br />

Bertini I, Felli IC and Luchinat C (1998) High magnetic field consequences on the <strong>NMR</strong> hyperfine<br />

shifts in solution. J Magn Reson 134:360-364<br />

Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />

London.<br />

Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />

the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />

structural information. J Magn Reson 133:216-221<br />

Donaldson LW, Skrynnikov NR, Choy WY, Muhandiram DR, Sarkar B, Forman-Kay JD and Kay<br />

LE (2001) Structural characterization of proteins with an attached ATCUN motif by<br />

paramagnetic relaxation enhancement <strong>NMR</strong> spectroscopy. J Am Chem Soc 123:9843-9847<br />

John M, Headlam MJ, Dixon NE and Otting G (2007) Assignment of paramagnetic 15 N-HSQC<br />

spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />

Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />

Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />

Karplus M (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys<br />

30:11-15<br />

Kobe B, Guss M and Huber T (2008). Structural Proteomics: High Throughput Methods. Humana<br />

Press, Totowa, NJ, USA.


1.5 References. 25<br />

Kuhn HW (1955) The Hungarian Method for the assignment problem. Naval Res Logistics Quart<br />

2:83-97<br />

McCoy MA and Wyss DF (2002) Structures of protein-protein complexes are docked using only<br />

<strong>NMR</strong> restraints from residual dipolar coupling and chemical shift perturbations. J Am<br />

Chem Soc 124:2104-2105<br />

Nguyen BD, Xia ZC, Yeh DC, Vyas K, Deaguero H and La Mar GN (1999) Solution <strong>NMR</strong><br />

determination of the anisotropy and orientation of the paramagnetic susceptibility tensor as<br />

a function of temperature for metmyoglobin cyanide: Implications for the population of<br />

excited electronic states. J Am Chem Soc 121:208-217<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />

Soc 128:3696-3702<br />

Rohl CA and Baker D (2002) De novo determination of protein backbone structure from residual<br />

dipolar couplings using rosetta. J Am Chem Soc 124:2723-2729<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />

structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />

Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />

structure determination package. J Magn Reson 160:65-73<br />

Shen Y and Bax A (2007) Protein backbone chemical shifts predicted from searching a database for<br />

torsion angle and sequence homology. J Biomol <strong>NMR</strong> 38:289-302<br />

Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />

Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />

A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc<br />

Natl Acad Sci U S A 105:4685-4690<br />

Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />

from fragments with similar local sequences using simulated annealing and bayesian<br />

scoring functions. J Mol Biol 268:209-225<br />

Smith DJ, Maggio ET and Kenyon GL (1975) Simple alkanethiol groups for temporary blocking of<br />

sulfhydryl groups of enzymes. Biochemistry 14:766-71


26 Chapter 1. Introduction.<br />

Su XC, Liang HB, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- is a convenient paramagnetic<br />

shift reagent for protein <strong>NMR</strong> studies. J Am Chem Soc 131:10352-10353<br />

Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />

<strong>NMR</strong> in press


Chapter 2<br />

Possum: paramagnetically<br />

orchestrated spectral solver of<br />

unassigned methyls<br />

2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls


28 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

2.1 Abstract<br />

Pseudocontact shifts (PCS) induced by a site-specifically bound paramagnetic lanthanide<br />

ion are shown to provide fast access to sequence-specific resonance assignments of methyl groups<br />

in proteins of known three-dimensional structure. Stereospecific assignments of Val and Leu<br />

methyls are obtained as well as the resonance assignments of all other methyls, including Met CH3<br />

groups. No prior assignments of the diamagnetic protein are required, nor are experiments that<br />

transfer magnetization between the methyl groups and the protein backbone. Methyl Cz-exchange<br />

experiments were designed to provide convenient access to PCS measurements in situations where<br />

a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In the absence of<br />

exchange, simultaneous 13 C-HSQC assignments and PCS measurements are delivered by the newly<br />

developed program Possum. The approaches are demonstrated with the complex between the N-<br />

terminal domain of the subunit and the subunit of the Escherichia coli DNA polymerase III.<br />

2.2 Introduction<br />

Methyl groups are excellent probes for the study of proteins by <strong>NMR</strong> spectroscopy due to<br />

their favorable relaxation properties and intense 1 H <strong>NMR</strong> signals. When buried, they report on the<br />

packing of side chains in the protein core and thus provide important restraints for protein fold<br />

determination (Zwahlen et al., 1998). On the protein surface, they can serve as hydrophobic probes<br />

of protein-protein (Janin et al., 1988, Gross et al., 2003) and protein-ligand (Hajduk et al., 2000)<br />

interactions. Methyl groups have also been established as probes of protein dynamics (Nicholson et<br />

al., 1992, Muhandiram et al., 1995, Wand et al., 1996, Liu et al., 2003, Korzhnev et al., 2004,<br />

Tugarinov et al., 2005a, Tugarinov et al., 2005b) which, in contrast to amide protons, are inert with<br />

regard to solvent exchange.<br />

The resonance assignment of methyl groups in 13 C labeled proteins is usually achieved by<br />

magnetization transfers from sequentially assigned backbone resonances (Montelione et al., 1992).<br />

While this approach works well for proteins up to 30 kDa, it is impeded by fast transverse<br />

relaxation for proteins of high molecular weight or for paramagnetic proteins. Recent advances use<br />

tailored isotope labeling schemes (Tugarinov et al., 2003a, Tugarinov et al., 2003b) which are<br />

expensive and not generally applicable to any type of methyl group. In particular, the methyl<br />

groups of methionine residues are hard to assign since any scalar couplings with the CH3 group<br />

are small (Bax et al., 1994).


2.2 Introduction. 29<br />

As a further drawback, experiments that transfer magnetization between methyl groups and<br />

backbone resonances usually do not afford stereospecific discrimination between the prochiral<br />

methyl groups in Val and Leu residues. In this situation, stereospecific assignments require<br />

additional, stereospecifically labeled samples (Neri et al., 1989, Senn et al., 1989, Ostler et al.,<br />

1993, Kainosho et al., 2006) or more complicated <strong>NMR</strong> experiments that often entail cumbersome<br />

data analysis (Zuiderweg et al., 1985, Sattler et al., 1992, Karimi-Nejad et al., 1994, Tugarinov et<br />

al., 2004, Tang et al., 2005).<br />

In the case where the three-dimensional structure of the protein is known prior to the <strong>NMR</strong><br />

studies, it would be attractive to use the structure to facilitate the <strong>NMR</strong> resonance assignments. In<br />

favorable situations, structure-based resonance assignments can be achieved from NOE data<br />

(Grishaev et al., 2002). In addition, structure-based assignments of backbone resonances have been<br />

achieved using residual dipolar couplings (RDCs) measured with different alignment media (Jung<br />

et al., 2004) or using the combined information from pseudocontact shifts (PCS), RDCs,<br />

paramagnetic relaxation enhancements (PREs) and cross-correlated relaxation (CCR) induced by<br />

paramagnetic metal ions (Pintacuda et al., 2007). The structural interpretation of PCS has been used<br />

earlier to support resonance assignments of ligand residues in heme proteins (Senn et al., 1985).<br />

Recent advances in site-specific attachment of single lanthanide ions to proteins (Ma et al., 2000,<br />

Dvoretsky et al., 2002, Wöhnert et al., 2003, Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et<br />

al., 2005, Haberz et al., 2006, Rodriguez-Castañeda et al., 2006, Su et al., 2006) extend this<br />

approach to long-range paramagnetic effects, with the possibility of tuning the range of focus by<br />

choice of a particular lanthanide (Allegrozzi et al., 2000, Balayssac et al., 2006, Pintacuda et al.,<br />

2007).<br />

Here we show that the analysis of PCS induced by lanthanide ions presents a powerful tool<br />

for the assignment of methyl resonances, which by reference to the 3D structure of the protein,<br />

works even in situations when connectivities to the backbone resonances are difficult to establish or<br />

the backbone resonance assignment is incomplete. Stereospecific assignments of Val and Leu<br />

methyls are obtained as well as the assignments of any other methyl resonances, including those of<br />

Met CH3 groups. We present two Cz-EXSY experiments for the convenient measurement of PCS<br />

in situations where a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In<br />

addition, an algorithm was developed to assign the 13 C-HSQC cross-peaks of methyl groups in the<br />

situation where no exchange information is available. The approaches are demonstrated with the 30<br />

kDa complex between the N-terminal exonuclease domain 186 and the subunit of Escherichia


30 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

coli DNA polymerase III. The active site of 186 binds two divalent ions (Hamdan et al., 2002b)<br />

that can be replaced by a single Ln 3+ ion (Pintacuda et al., 2004).<br />

2.3 Experimental section<br />

2.3.1 Sample preparation<br />

A cyclized version of 186, cz- 186, was designed for enhanced stability of the protein in<br />

unrelated crystallographic experiments (Park, 2006). Using an intein-based strategy (Williams et<br />

al., 2002), the N-terminal Ser2 and C-terminal Ala186 of 186 were linked by the nonapeptide<br />

TRESGSIEF (numbered 187-195). Apart from the N- and C-terminal residues that are structurally<br />

disordered in 186 (Hamdan et al., 2002b),the amide proton chemical shifts of the linear protein are<br />

conserved in cz- 186 within ±0.05 ppm, indicating that cyclization does not significantly affect the<br />

protein structure. The proteins cz- 186 and were prepared, and used to isolate samples of the cz-<br />

186/ complex, essentially as described previously (Hamdan et al., 2002a). <strong>NMR</strong> experiments<br />

made use of three different samples of complexes of unlabeled with isotope-labeled cz- 186: (i) a<br />

uniformly 13 C/ 15 N-labeled sample (0.5 mM), (ii) a biosynthetically directed fractional 13 C-labeled<br />

sample prepared from 20% 13 C-glucose (0.5 mM) (Neri et al., 1989, Senn et al., 1989), and (iii) a<br />

sample with 13 C/ 15 N-Leu (0.15 mM). Samples of 186/ were dialyzed against <strong>NMR</strong> buffer (20<br />

mM Tris, pH 7.2, 100 mM NaCl, 0.1 mM dithiothreitol, and 0.08% (w/v) NaN3 in 90% H2O/10%<br />

D2O).<br />

Lanthanides (Ln 3+ = La 3+ or 1:1 mixtures of La 3+ /Dy 3+ or La 3+ /Yb 3+ ) were added from<br />

LnCl3 stock solutions in the same buffer containing total Ln 3+ concentrations of 30 mM. The 1:1<br />

mixtures were added in slight molar excess to catalyze the metal ion exchange, resulting in<br />

exchange rates of a few s –1 (John et al., 2007a, John et al., 2007b).Restoration of the apo-complex<br />

was achieved by extensive dialysis against buffer containing 1 mM EDTA followed by dialysis<br />

against EDTA-free buffer.<br />

2.3.2 <strong>NMR</strong> spectroscopy<br />

All <strong>NMR</strong> experiments were performed at 25 o C on a Bruker AV 800 MHz <strong>NMR</strong><br />

spectrometer equipped with a cryogenic TCI probe. Sequence-specific resonance assignments of


2.3 Experimental section. 31<br />

the methyl groups in the diamagnetic state were established by 3D HNCA and (H)CCH-TOCSY<br />

spectra of the uniformly 13 C/ 15 N labeled sample complexed with 1 equivalent of La 3+ (cz-<br />

186/ /La 3+ ), and by reference to the assignments reported for the linear 186 protein with Mg 2+<br />

(DeRose et al., 2003). Stereospecific assignments of Val and Leu methyl groups were obtained<br />

from a constant-time (28 ms) 13 C-HSQC spectrum recorded of the fractionally 13 C labeled sample.<br />

Where possible, the rotameric states of the side chains of Val and Leu residues in the crystal<br />

structure of 186 (Hamdan et al., 2002b) were confirmed in solution by a 3D NOESY- 15 N-HSQC<br />

spectrum (mixing time 60 ms) recorded of the uniformly 13 C/ 15 N labeled sample.<br />

Sequence-specific resonance assignments of the methyl groups in the paramagnetic state<br />

were established by 2D and 3D methyl Cz-EXSY spectra recorded with the pulse schemes of Figure<br />

2.1 using a mixing period ( m) of 480 ms and spectral widths of 30 ppm ( 13 C) and 16 ppm ( 1 H). The<br />

2D spectra were acquired with 160 × 1024 complex data points and 32 scans in 10 h, while the 3D<br />

spectra were acquired with 80 × 64 × 1024 complex points and 4 scans in 40 h. For all spectra, the<br />

initial t1 delay was set to half the increment so that folded paramagnetic peaks could be identified<br />

by their inverted sign (Bax et al., 1991).<br />

The methyl group assignments obtained with these experiments provided the controls for<br />

the assignment methods described below.<br />

Figure 2.1 Methyl CZ-EXSY experiments. (a, b) Pulse schemes of the 2D and 3D versions,<br />

respectively. Narrow and wide bars represent radiofrequency pulses with flip angles of 90º<br />

and 180º, respectively, applied with phase x unless indicated otherwise. Selective 13 C pulses<br />

were applied as a 1.5 ms Q5 pulse prior to the delay C and as a 1.5 ms time-reversed Q5


32 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

pulse prior to the delay H, generating an excitation bandwidth of 20 ppm. 1 H saturation is<br />

achieved with 120º pulses applied every 5 ms, and the 3-9-19 sequence is used for water<br />

suppression. During m, 180 o ( 1 H) pulses are applied every 6 ms with a MLEV-16 supercycle<br />

to suppress cross relaxation between 1 H and 13 C spins. The phase cycle was 1(x, –x), 2(2x,<br />

2(–x)), 3(x), 4(4y, 4(–y)) and rec(x, 2(–x), x, –x, 2x, –x). States-TPPI was applied to 1 and<br />

3 for quadrature detection. Delays: T = 28 ms, = 3 ms, C = 0.75 ms, H = 1.7 ms.<br />

Gradients (Gi) were applied along the z-axis with strengths of 23.2, 14.5, 20.3, 17.5 and 11.6<br />

G/cm. (c-e) Simulated dipolar 13 C relaxation rates and NOE in isolated CH3 groups versus<br />

molecular rotational correlation time ( R) using eqs 1-4 in ref. 5. (c) Transverse relaxation<br />

rate R2, (d) longitudinal relaxation rate R1, and (e) steady-state 13 C{ 1 H} NOE. The dashed<br />

line reports the relaxation rates calculated for a static CH3 group, whereas the solid line<br />

takes into account a rapid rotation around the three-fold symmetry axis with a correlation<br />

time f = 25 ps and assuming tetrahedral geometry (Sf 2 = 0.111, rCH = 1.10 Å, rCC = 1.52 Å)<br />

(Wand et al., 1996). The dotted line represents the contribution from a neighboring 13 C spin<br />

to R2, R1, and cross-relaxation ( ), respectively. The vertical axis of the 13 C- 13 C cross-<br />

relaxation rate in (e) is in s –1 . Due to the small contribution of PRE to R1 (John et al., 2007a),<br />

the 13 C{ 1 H} NOE is similar for paramagnetic and diamagnetic proteins.<br />

2.3.3 Manual resonance assignments from PCS<br />

The PCS measured from EXSY spectra were used to evaluate the possibility of assigning<br />

the methyl peaks by comparison with back-calculated PCS. PCS were back-calculated using a<br />

Mathematica (Wolfram <strong>Research</strong>) script and the crystal structure of 186 (PDB entry 1J53, ref. 40).<br />

The -tensor parameters of Dy 3+ in complex with 186/ have been reported previously (Schmitz<br />

et al., 2006). The tensor parameters for Yb 3+ were determined from 15 N-HSQC spectra using the<br />

program Echidna (Schmitz et al., 2006) as: ax = –6.52 × 10 –32 m 3 , rh = 1.12 × 10 –32 m 3 , =<br />

24.4º, = 84.5º, and = –299.5º (using the zxz convention of Euler angle rotations). 1 H PCS of<br />

methyl groups were calculated for each of the three methyl protons individually and averaged. This<br />

average is largely insensitive to the rotational position of the methyl group. Residual CSA effects<br />

due to paramagnetic alignment (John et al., 2005) were disregarded since CSA tensors of methyl<br />

groups are small (Liu et al., 2003).<br />

2.3.4 The program Possum


2.3 Experimental section. 33<br />

The program Possum (paramagnetically orchestrated spectral solver of unassigned methyls)<br />

was developed to assign the cross-peaks of methyl groups in correlation spectra recorded with<br />

diamagnetic and paramagnetic metal ions by reference to the 3D structure of the protein and<br />

independently determined tensors. The program requires that the amino-acid type is known (e.g.<br />

by residue-type selective 13 C-labeling). Furthermore, it can accept information about methyl cross-<br />

peaks belonging to the same residue (―methyl connectivity‖ data for Ile, Leu, and Val, as provided<br />

by HCCH-TOCSY experiments) and stereospecific information (―methyl specificity‖ data<br />

distinguishing between 2 and 1 cross-peaks of Ile, 1 and 2 cross-peaks of Leu, and 1 and 2<br />

cross-peaks of Val, as provided by samples produced with biosynthetically directed fractional 13 C-<br />

labeling (Neri et al., 1989, Senn et al., 1989, Tugarinov et al., 2004) or stereoselective isotope<br />

labeling (Ostler et al., 1993, Kainosho et al., 2006)). In the present version of the program, the<br />

methyl connectivity information is always assumed to be available for the diamagnetic state.<br />

The program takes as input the 1 H and 13 C chemical shifts of methyl groups measured in<br />

13 C-HSQC spectra and the 13 C chemical shifts of methyl groups that are too close to the<br />

paramagnetic center to be directly observable in 1 H-detected <strong>NMR</strong> spectra. By comparing the<br />

chemical shifts in the diamagnetic and paramagnetic states, Possum attempts to find the resonance<br />

assignment with the lowest residual cost C(l) defined by:<br />

with:<br />

subject to:<br />

(2.1)<br />

(2.2)


34 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

(2.3)<br />

(2.4)<br />

where PCS S calc(k,l) is the predicted PCS value for the spin S (S = 13 C or 1 H) of the methyl<br />

group in residue k arising from the paramagnetism of the lanthanide l, para δ S exp(j,l) is the chemical<br />

shift of the resonance j in the presence of the lanthanide l for the spin S, and dia δ S exp(i) is the<br />

diamagnetic resonance i for the spin S. The cost function assigns smaller costs for deviations<br />

between calculated and observed PCS when the experimentally observed PCS is large, while the<br />

constant e(l) prevents a singularity in the cost function and accounts for the error in measurements<br />

when a spin experiences small paramagnetic effects far away from the paramagnetic center. e(l)<br />

scales with the magnitude of the tensor of the lanthanide l. Empirically determined values<br />

(e(Yb 3+ ) = 1/6 ppm and e(Dy 3+ ) = 1 ppm) were used here 2 . Equations (2.3) and (2.4) ensure that<br />

each calculated PCS and each experimental chemical shift are chosen exactly once within the<br />

global assignment.<br />

Equations (1.1), (2.3) and (2.4) present the formulation of the three-index assignment<br />

problem (Schell, 1955) which is the three-dimensional instance of the multidimensional assignment<br />

problem (MAP). With D being the number of dimensions of the MAP (D = 3 in the example above)<br />

and n being the size of each of the D sets of data, there are (n!) D-1 possible assignments. When D is<br />

strictly larger than 2, MAP has been proven to be NP-hard (Karp, 1972) and, as a result, no<br />

algorithm can guarantee the best solution to the problem in a polynomial time. An exhaustive<br />

2 The purpose of e(l) is to avoid degenerate cases where the experimental PCS is very close to zero.<br />

Its value is not critical for the success of the algorithm. E(l) has however been optimized to yield<br />

best possible results.


2.3 Experimental section. 35<br />

search through the (n!) 2 possibilities is impracticable for even the smallest problem sizes. An exact<br />

branch and bound algorithm that explores only a part of all possible assignments has been proposed<br />

(Balas et al., 1991) and works well for small problem sizes, especially when there is a good<br />

agreement between predicted and observed PCS. In the present context, a simulated annealing<br />

optimization scheme proved more efficient computationally. The dimensionality D of the<br />

assignment problem generated by Possum depends on the residue type, the availability of<br />

connectivity information, and the number of different lanthanides used. We have performed<br />

calculations with up to 6 dimensions. Examples of 3- and 4-dimensional problems are illustrated in<br />

Figure 2.2.<br />

Figure 2.2 Formulation of the assignment problem depending on the information available. The<br />

columns dia δ S exp and para δ S exp contain the chemical shifts (S = 13 C and S = 1 H as observed for 13 C-<br />

HSQC cross-peaks) measured in the presence of a diamagnetic or paramagnetic lanthanide,<br />

respectively. The column marked PCS S calc contains the 13 C and 1 H PCS calculated from the<br />

tensor and the 3D structure of the protein. (a) Assignment problem for residues with a single<br />

methyl group (Ala, Met, Thr). The indices i and j refer to the cross-peak number in the diamagnetic<br />

and paramagnetic state, respectively, and the index k is the residue number in the amino-acid<br />

sequence, as in equation (1.1). The assignment (i = 1, j = 3, k = 1) is illustrated by connecting<br />

lines. The associated cost can be calculated using equation (2.2). The other n-1 assignments<br />

necessary to calculate the total cost C(l) according to equation (1.1) are not shown. Overall, this<br />

assignment problem is three-dimensional. (b) Simultaneous use of the information from two<br />

samples containing the paramagnetic lanthanides l1 or l2 creates a four-dimensional assignment<br />

problem. (c) For amino acids with two methyl groups (Ile, Leu, Val), the columns dia δ S exp, para δ S exp,<br />

and PCS S calc embed the chemical shifts (and PCS) of two methyl groups (m1 and m2). If the methyl-<br />

specificity information is not available in the paramagnetic state (illustrated by m? in the column


36 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

para δ S exp), Possum will compute the two possible costs and only keep the lower one. (d) For Ile, Leu,<br />

and Val residues, the methyl-methyl connectivity information may be available in the diamagnetic<br />

state but not in the paramagnetic state. This situation creates a four-dimensional assignment<br />

problem for data from a single paramagnetic lanthanide.<br />

The program also takes into account the absence of paramagnetic peaks due to PRE by<br />

preventing the assignment of observable paramagnetic peaks to methyl groups located closer to the<br />

metal ion than a user-specified cutoff. In the present work, cutoffs of 6 and 9 Å were used for the<br />

Yb 3+ and Dy 3+ complexes, respectively. Paramagnetic peaks missing for any other reason (e.g.<br />

spectral overlap) are also tolerated. This is achieved by assigning a cost only to pairings of<br />

observable paramagnetic and diamagnetic peaks, whereas a zero cost is associated with any<br />

unassigned diamagnetic peak left over. Finally, the program allows for the possibility that<br />

paramagnetic shifts may have been observed only for either the 13 C or the 1 H resonance of a methyl<br />

group.<br />

The calculation of the assignment starting from the chemical shifts of Table S2.1 took less<br />

than 2 h on an AMD 64 4200+ processor, when using all available information, including methyl<br />

connectivity and methyl specificity information and the chemical shifts from the Yb 3+ and Dy 3+<br />

complexes.<br />

2.4 Results<br />

2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes<br />

Constant-time 13 C-HSQC spectra of the uniformly 13 C labeled diamagnetic cz- 186/ /La 3+<br />

complex and the paramagnetic cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+ complexes illustrate the spectral<br />

complexity of the methyl region and the effect of the paramagnetism. The spectrum of the cz-<br />

186/ /La 3+ complex (blue peaks in Figure 2.3) contains approximately the number of methyl peaks<br />

expected for 19 Ala, 14 Thr, 6 Met, 12 Val, 17 Leu, and 14 Ile residues (125 methyl groups). The<br />

signals of Met CH3 groups are particularly well resolved and easily identified as they appear with<br />

opposite sign.


2.4 Results. 37<br />

Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex<br />

(containing 13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a)<br />

La 3+ /Dy 3+ and (b) La 3+ /Yb 3+ (red). Met CH3 and CH2 groups appear with inverted sign<br />

(light colors). The spectra were recorded using a constant time of 28 ms and t2max = 160 ms.<br />

The spectra of the mixed samples were acquired with 4 times as many scans to compensate<br />

for the halving of the effective concentrations.<br />

As Dy 3+ is one of the strongest paramagnetic lanthanide ions (Pintacuda et al., 2007), the<br />

methyl peaks of the cz- 186/ /Dy 3+ complex (red peaks in Figure 2.3a) are strongly shifted by PCS<br />

and affected by 1 H line broadening due to transverse paramagnetic relaxation enhancement (PRE).<br />

Thus, only 55 cross-peaks are observable corresponding to methyl groups with a distance from the<br />

Dy 3+ ion larger than 15 Å, many of them with intensities close to the noise level. Part of the 1 H line<br />

broadening is caused by unresolved RDCs, including intra-methyl RDCs (Kaikkonen et al., 2001),<br />

originating from the paramagnetically induced alignment of the protein with the magnetic field.<br />

In the cz- 186/ /Yb 3+ complex, the cutoff distance is reduced to about 9 Å due to the about<br />

6 times smaller paramagnetic moment of Yb 3+ so that only 10 methyl peaks are expected to be<br />

broadened beyond detection. Of the remaining 115 methyl resonances (red peaks in Figure 2.3b),<br />

only 14 peaks could not be analyzed due to overlap or very small PCS at larger distances from the<br />

metal ion. Figure 2.3 shows that for both paramagnetic lanthanides, it is nearly impossible to trace<br />

the paramagnetic shift of a 13 C-HSQC peak using the criterion that the PCS in the 13 C and 1 H<br />

dimensions of the spectrum must be similar. (In methyl groups, the distance between the carbon<br />

and the average position of the three protons is less than 0.4 Å.) Therefore, without prior<br />

knowledge of resonance assignments, PCS measurements cannot be made manually from 13 C-<br />

HSQC spectra alone.


38 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

2.4.2 Methyl CZ-EXSY experiments<br />

In order to measure PCS data conveniently and with high sensitivity, we designed an<br />

experiment applicable to samples prepared with a 1:1 mixture of paramagnetic and diamagnetic<br />

metal ions, where chemical exchange between the metal ions leads to exchange of the protein<br />

between paramagnetic and diamagnetic states. By generating exchange cross-peaks between methyl<br />

peaks of the diamagnetic and paramagnetic lanthanide complexes, the experiment allows the<br />

measurement of 1 H and 13 C PCS from a single spectrum. Figure 2.1 shows 2D and 3D versions of<br />

the methyl Cz-EXSY experiment. The pulse sequences are related to previously published Nz-<br />

exchange experiments (Farrow et al., 1994, John et al., 2007a).<br />

During a mixing period m, magnetization is stored as relatively slowly relaxing CZ<br />

magnetization. Simulations indicate that, owing to rapid rotation around the 3-fold symmetry axis,<br />

longitudinal relaxation rates R1 of methyl 13 C spins are fairly insensitive with respect to molecular<br />

size, and barely exceed 2 s –1 even for very small proteins (Figure 2.1d). In the cz- 186/ /La 3+<br />

complex ( C = 17 ns), we measured R1( 13 C) rates of about 1.6 s –1 for the majority of methyl groups.<br />

Only a group of highly mobile Thr residues relaxed somewhat faster (2 s –1 ), whereas the R1( 13 C)<br />

relaxation in Met CH3 groups was much slower (about 0.7 s –1 ). In contrast to transverse relaxation<br />

rates R2, R1 rates in macromolecules are barely affected by the paramagnetism of lanthanides(John<br />

et al., 2007a).<br />

The experiment yields auto-peaks for the diamagnetic and paramagnetic states (dd and pp<br />

peaks, respectively) and exchange peaks arising from magnetization exchange from the<br />

paramagnetic to the diamagnetic state and vice versa (pd and dp peaks, respectively).<br />

Since the experiment starts from 13 C polarization rather than using an INEPT transfer, pd<br />

peaks can be detected even for methyl groups that are strongly affected by 1 H PRE in the<br />

paramagnetic state and thus invisible in the 13 C-HSQC spectrum. Combined with the dd peaks, this<br />

allows 13 C PCS measurements that are limited only by the (16-fold smaller) 13 C PRE (John et al.,<br />

2007b). As indicated previously 13 and illustrated by the simulations of Figure 2.1e, 13 C polarization<br />

in methyl groups of proteins can be very efficiently enhanced using the { 1 H} 13 C NOE. This holds<br />

irrespective of paramagnetism. We observed an about two-fold increase in 13 C polarization in the<br />

cz- 186/ /La 3+ complex using 1 s of 1 H irradiation between subsequent scans.<br />

For improved resolution in the 13 C dimension and measurement of small 13 C PCS, the 2D<br />

experiment is implemented as a constant-time experiment in the t1 dimension. The 3D experiment<br />

additionally records the 13 C frequency of the protein state after the mixing time. Real-time


2.4 Results. 39<br />

evolution periods in both indirect dimensions yield superior sensitivity for residues with substantial<br />

13 C PRE that commonly also have larger PCS. Selective 13 C pulses select the spectral window of<br />

the methyl 13 C resonances of the diamagnetic complex in order to limit the spectral width required<br />

in the F1 dimension.<br />

2.4.3 Resonance assignment of Met, Ala and Thr methyl groups<br />

Figure 2.4a shows the spectral region of the Met CH3 cross-peaks of the 2D methyl Cz-<br />

exchange spectrum, recorded with a sample of cz- 186/ containing La 3+ and Dy 3+ in a 1:1 ratio.<br />

Out of six Met residues, four are observed in the 13 C-HSQC spectrum of the cz- 186/ /Dy 3+<br />

complex (the cross-peak of Met178 in the paramagnetic state appears with very weak intensity at<br />

2( 1 H) = 5.33 ppm). For these residues, both auto and both exchange peaks become visible in the<br />

exchange spectrum, forming a rectangle that allows straightforward identification of dd-pp peak<br />

pairs, yielding 13 C PCS of 3.28, 1.31, 0.49 and –0.65 ppm. A fifth residue only yields a pd<br />

exchange peak with a 13 C PCS value of –3.39 ppm.<br />

Figure 2.4 Assignment of Met CH3 from PCS. (a) 2D methyl Cz-EXSY spectrum of cz- 186/<br />

(containing 13 C/ 15 N-labeled cz- 186) loaded with a 1:1 mixture of La 3+ and Dy 3+ (red), overlaid<br />

with the 13 C-HSQC spectrum (blue). The diamagnetic auto-peaks (dd) are labeled with the<br />

assignment and connected to the paramagnetic auto-peaks (pp) and the exchange peaks (pd and<br />

dp) with dashed rectangles. The dp and pp peaks of Met178 are outside the selected spectral region


40 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

at 2 = 5.53 ppm. Met107 only shows a pd exchange peak (vertical dashed line), and neither pp-<br />

peak nor exchange peaks are observed for Met18. The spectrum was recorded with a mixing time<br />

of 480 ms. (b) Comparison of predicted (top) and measured (bottom) PCS of Met CH3 groups. 13 C<br />

PCS and 1 H PCS are plotted with filled and open bars, respectively, and sorted according to the<br />

predicted 13 C PCS. The distances rC-Ln are given in Å in the center. 3<br />

The measured PCS can be compared with values predicted from the known structure of<br />

186 (Park, 2006) and the previously determined tensor of Dy 3+ (Figure 2.4b) (Schmitz et al.,<br />

2006). Only five Met residues belong to the structured part of the protein with predicted 13 C PCS of<br />

3.74 (Met178), 1.30 (Met137), –0.56 (Met87), –1.68 (Met18) and –2.87 ppm (Met107). Met185 is<br />

located in the flexible cyclizing loop of cz- 186 and can be immediately assigned to the very<br />

intense and narrow resonance with a 13 C PCS of 0.49 ppm, in agreement with the PCS of 0.53 ppm<br />

observed for the amide proton of this residue. Met18 is the residue closest to the metal ion (rC-Dy =<br />

12.0 Å) and can be assigned to the methyl group that does not show any exchange peak. As Met18<br />

lines the active site this assignment is independently confirmed by its sensitivity to titration with<br />

nucleotides (unpublished results). Met107 is the second closest residue (rC-Dy = 14.2 Å) and<br />

displays a pd but no dp exchange peak; the assignment of all other Met residues follows in a<br />

straightforward manner from the PCS data.<br />

The data show that it is possible to assign a limited number of methyl groups using PCS<br />

only. The situation is more complex for the methyl groups of the other amino acids since with the<br />

exception of Ile CH3 groups, the amino acid type cannot be identified from 13 C-HSQC spectra<br />

alone. This information would have to be provided either by the use of residue-specific labeling or<br />

additional <strong>NMR</strong> experiments (in the cz- 186/ /La 3+ complex, the amino acid type can readily be<br />

identified from a 3D (H)CCH-TOCSY spectrum). In addition, important information is provided by<br />

(i) the relative size of 13 C and 1 H PCS and (ii) whether the paramagnetic 1 H resonance can be<br />

observed (rC-Dy > 15 Å) or only pd exchange peaks (rC-Dy > 10 Å, Figure S2.4 and Figure S2.5).<br />

3 Experimental PCS have an error below 0.1 ppm. Errors in the calculated PCS depend on the<br />

quality of the 3D structures used. Residues 87, 107, 137 and 178 belong to structured part. The<br />

error on their calculated PCS can be considered below 10% of their absolute value.


2.4.4 Assignments of Val, Leu, and Ile methyl groups<br />

2.4 Results. 41<br />

Val, Leu, and Ile are amino acids with two methyl groups that can easily be linked by<br />

correlations observed in TOCSY spectra; combining the PCS data for both methyl groups greatly<br />

facilitates the resonance assignment of these residues. This is illustrated in Figure 2.5 with the cz-<br />

186/ complex containing 1:1 mixtures of Yb 3+ /La 3+ (a, b) and Dy 3+ /La 3+ (c, d), respectively.<br />

Whereas a 2D (H)C(C)H-TOCSY experiment (Figure S2.1, Supporting Information) recorded with<br />

short mixing time (12 ms) strongly favors intra-residual methyl-methyl correlations (Figure 2.5a)<br />

(Eaton et al., 1990), the 2D methyl CZ-EXSY spectrum yields predominantly exchange peaks and<br />

only weak 1- 2 correlations arising from 13 C- 13 C NOE (Fischer et al., 1996). We have also applied<br />

the (H)C(C)H-TOCSY experiment to a sample of the pure cz- 186/ /La 3+ complex containing<br />

selectively 13 C/ 15 N-Leu labeled cz- 186, where all 1- 2 methyl pairs could be identified (Figure<br />

S2.4).


42 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />

stereospecific resonance assignments. (a) Selected spectral region from a 2D (H)C(C)H TOCSY<br />

spectrum (red) of cz- 186/ loaded with a 1:1 mixture of La 3+ and Yb 3+ , showing the methyl cross-<br />

peaks of Val96. The spectrum is overlaid with the 13 C-HSQC spectrum (blue). Intraresidual<br />

correlations between the cross-peaks of the 1CH3 and 2CH3 groups are identified by dotted lines.<br />

The TOCSY spectrum was recorded with 12 ms mixing time. (b) Same spectral region as in (a)<br />

taken from the 2D methyl Cz-EXSY spectrum of the same sample recorded with a mixing time of<br />

480 ms. (c) Selected strips from the 3D methyl CZ-EXSY spectrum of cz- 186/ loaded with a 1:1<br />

mixture of La 3+ and Dy 3+ (right panels) aligned with corresponding spectral regions from the 2D<br />

methyl CZ-EXSY spectrum (left panels). The strips display the methyl group correlations of Val10<br />

and Leu113. The arrows point from the chemical shifts of the diamagnetic auto-peaks (dd) to the<br />

chemical shifts of the exchange peaks (pd), indicating the 13 C PCS. Horizontal lines identify the


2.4 Results. 43<br />

positions of the dd- and pd-peaks in the 1( 13 C) dimension. The line at 26.35 ppm identifies the 13 C-<br />

13 C NOEs with the CH group of Leu113. (d) Assignment of methyl resonances from the<br />

comparison of predicted with experimental PCS. For each Val residue, the distances from the<br />

lanthanide in Å are indicated for both methyl carbons in the center of the plot. 13 C PCS and 1 H<br />

PCS are displayed as filled and open bars, respectively. The residues are sorted according to the<br />

predicted 13 C-PCS and the PCS are plotted in the sequence C 1 /H 1 /C 2 /H 2 .<br />

Figure 2.5c compares the measurement of 13 C PCS for two residues from strips of 2D and<br />

3D methyl CZ-EXSY spectra. The two experiments are complementary, showing better frequency<br />

resolution in the 2D spectrum and generally less cross-peak overlap in the 3D spectrum. The<br />

example of Leu113 shows that one- and two-bond 13 C- 13 C NOE correlations are visible, but<br />

generally of much smaller intensity than the exchange peaks. Through-bond correlations can again<br />

be identified from TOCSY spectra. From the combined use of the 2D and 3D Cz-EXSY spectra, all<br />

13 C-HSQC cross-peaks observable for any of the methyl groups of the cz- 186/ /Dy 3+ and cz-<br />

186/ /Yb 3+ complexes could readily be correlated with the corresponding 13 C-HSQC cross-peaks<br />

of the cz- 186/ /La 3+ complex, yielding the PCS. Compared to the 13 C-HSQC spectrum, the methyl<br />

Cz-EXSY spectra yielded the 13 C chemical shifts in the paramagnetic state for a further 47 methyl<br />

groups of the cz- 186/ /Dy 3+ complex with rC-Dy distances as short as 10 Å, leaving only 11 methyl<br />

groups completely unobservable due to excessive PRE. For the cz- 186/ /Yb 3+ complex, the<br />

methyl Cz-EXSY spectra yielded the 13 C chemical shifts for 7 additional methyl groups with rC-Yb<br />

distances as short as 6 Å, leaving only 1 methyl group unobservable.<br />

Figure 2.5d compares the predicted and measured PCS for Val methyl groups in the cz-<br />

186/ /Dy 3+ complex. Each residue is characterized by up to 4 PCS values, resulting in the<br />

straightforward assignment of 10 out of 12 residues. Only Val39 did not yield unambiguous PCS<br />

data due to resonance overlap, and Val65 is too close to the Dy 3+ ion. The assignment of Val65<br />

could be made in the 186/ /Yb 3+ complex, where this residue yields large PCS (Supporting<br />

Information). The methyl cross-peaks of Leu residues can be assigned in an analogous way (see<br />

below).<br />

Importantly, this approach automatically yields the stereospecific assignment of Val and<br />

Leu methyl peaks, as long as different PCS are observed for the two prochiral methyl groups. Since<br />

the methyl carbons in an isopropyl group are separated by 2.5 Å, this is almost always the case


44 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

(Figure 2.5d). A rare exception is Val50, where the predicted 13 C and 1 H PCS of the 1 and 2<br />

methyl groups are indistinguishable in both the Dy 3+ and Yb 3+ complex.<br />

The methyl groups of Ile residues are particularly easy to assign by PCS, since the spectral<br />

ranges of the 13 C-<strong>NMR</strong> signals of 2 and 1 methyl groups are clearly separated, while intra-<br />

residual methyl-methyl connectivity can still be obtained from TOCSY spectra.<br />

2.4.5 Automatic assignments without EXSY data<br />

Cz-EXSY spectra provide an exceptionally simple way of measuring PCS. For situations<br />

where the metal exchange is too slow for exchange spectra and spectral crowding prevents the<br />

straightforward pairing between diamagnetic and paramagnetic 13 C-HSQC peaks (Figure 2.3), we<br />

have devised the program Possum which determines the correct peak pairings, their resonance<br />

assignments, and their PCS, using the 3D structure of the protein and the tensor (that can readily<br />

be obtained from, e.g. , 15 N-HSQC spectra (Pintacuda et al., 2004, Schmitz et al., 2006)).<br />

The performance of the program was initially tested with simulated data, replacing the<br />

experimental paramagnetic shifts of Table S2.1 by shifts back-calculated from the crystal structure<br />

of 186 (Hamdan et al., 2002b) and using the crystal structure, the experimental diamagnetic<br />

chemical shifts, and the tensors of Dy 3+ and Yb 3+ as input. In all calculations, it was assumed<br />

that the residue types of all methyl resonances were known and the methyl connectivity information<br />

of Val, Leu, and Ile residues was available for the diamagnetic state. Except for extreme cases of<br />

spectral overlap, the program yielded 100% correct assignments. In a second step, structural<br />

uncertainties were simulated by randomly displacing the methyl groups, following a Maxwell-<br />

Boltzmann distribution with maxima at 0.35 and 0.7 Å (resulting in maximal atom displacements of<br />

0.75 and 1.5 Å, respectively, always using the same direction of displacement). Even in the case<br />

with the maximum structural noise, using only paramagnetic data from the Yb 3+ complex and<br />

neither methyl specificity nor methyl connectivity information in the paramagnetic state, Possum<br />

yielded >75% correct assignments of the diamagnetic methyl resonances (Table S2.2 and Table<br />

S2.4). The score increased to >90% when paramagnetic data from the Dy 3+ complex, methyl<br />

specificity information in all complexes, and methyl connectivity information in the Yb 3+ complex<br />

(but not the Dy 3+ complex) were included (Table S2.2 and Table S2.3).<br />

The program was subsequently applied to the experimental data of the methyl groups of cz-<br />

186/ loaded with La 3+ , Yb 3+ and Dy 3+ . Table 2.1 summarizes the results. Using both<br />

paramagnetic lanthanides, the assignment is complete and correct for all diamagnetic 13 C-HSQC


2.4 Results. 45<br />

cross-peaks that have observable paramagnetic partners. The only exceptions are swapped<br />

assignments for Met18 and Met107 and Val65 and Val82. The first arises from a side-chain<br />

conformation that is different in solution than in the single crystal and the second from differences<br />

in the predicted and experimental PCS observed for the peptide segment near Val65 (John et al.,<br />

2007b).The assignments of the methyl groups of the Yb 3+ complex are similarly reliable, whereas<br />

the methyl signals of the Dy 3+ complex are harder to assign (in the absence of methyl connectivity<br />

information). Using only data from the Yb 3+ complex and omitting any methyl specificity<br />

information or connectivity information in the paramagnetic state still results in >70% correct<br />

assignments of the diamagnetic methyl resonances (Table S2.2 and Table S2.4) 4 .<br />

Table 2.1 Automatic assignment of methyl groups by the program Possum a<br />

Residue type Occurrence b La c observable Yb d assigned Dy d assigned La e assigned<br />

Met 6 (1) 5 3/5 4/4 3/5<br />

Thr 14 (4) 8 7/7 7/7 7/7<br />

Ala 19 (2) 17 13/13 11/13 14/14<br />

Ile 14 (2) 24 24/24 21/23 24/24<br />

Val 12 (0) 24 20/20 17/20 18/22<br />

Leu 17 (0) 34 34/34 19/25 34/34<br />

a Obtained using the data reported in Table S2.1, the crystal structure of 186 (Hamdan et al.,<br />

2002b) and tensors determined from 15 N-HSQC spectra as described in the experimental<br />

section. The paramagnetic data measured with Yb 3+ and Dy 3+ were combined to derive the<br />

assignments.<br />

b Total number of residues in cz- 186. The number in brackets refers to residues not observed in<br />

the crystal structure; these were excluded from the calculation.<br />

c Number of methyl groups with coordinates reported in the crystal structure for which cross-peaks<br />

were observed in the cz- 186/ /Ln 3+ sample. Their unassigned chemical shifts were available for<br />

the program.<br />

4 It is the responsibility of the user to inspect and optimize the assignment provided by Possum; in<br />

particular to untangle areas of the spectrum with overlapping peaks.


46 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

d Fraction of correct assignments for the paramagnetic cz- 186/ /Yb 3+ and cz- 186/ /Dy 3+<br />

complexes, as indicated. The number in the denominator is the number of methyl groups for which<br />

cross-peaks were observed in the presence of Yb 3+ or Dy 3+ .<br />

e Fraction of correct assignments for the diamagnetic cz- 186/ /La 3+ complex. The number in the<br />

denominator is the number of methyl groups for which cross-peaks were observed in at least one of<br />

the paramagnetic complexes.<br />

2.4.6 PCS and flexibility<br />

Structural differences between the crystal structure of 186 determined under cryogenic<br />

conditions (Hamdan et al., 2002b) and the solution structure of the cz- 186/ complex become<br />

apparent as differences between measured and predicted PCS. In a few cases, the structural<br />

differences interfere with the resonance assignment. Figure 2.6a illustrates the situation for the cz-<br />

186/ /Dy 3+ complex, where the measured PCS of Leu161 are smaller than predicted and would<br />

more closely match the values predicted for Leu131. This can be explained by a small displacement<br />

of the peptide segment comprising residues 151-161 that decreases the PCS of both methyls of<br />

Leu161. Smaller PCS than expected were also observed for the backbone amides of this segment. 45<br />

The correct assignment would be obtained by focusing on the difference in 13 C PCS between both<br />

methyl groups rather than their magnitude (Figure 2.6a) or by using the data of the cz- 186/ /Yb 3+<br />

complex which are less strongly distance dependent in the 11 Å distance range (Figure 2.6b).


2.4 Results. 47<br />

Figure 2.6 Residues showing deviations between predicted and experimental PCS. Comparison of<br />

calculated and experimental PCS of Leu131, Leu95, Leu161, Leu11 and Ile154 in the cz-<br />

186/ /Dy 3+ complex. The data are plotted in the sequence C 1 /H 1 /C 2 /H 2 and C 2 /H 2 /C 1 /H 1 / for<br />

the Leu residues and Ile154, respectively. (b) Same as (a), but for the cz- 186/ /Yb 3+ complex. (c)<br />

Predicted 13 C PCS of the prochiral methyl groups of Val82 in cz- 186/ /Dy 3+ versus sidechain<br />

dihedral angle. The values predicted from the crystal structure of 186 40 are marked. (d) Same as<br />

(c), but for the CH3 groups of Leu95.<br />

In the cases of Val82 and Leu95 in the Dy 3+ complex, the comparison of experimental and<br />

predicted 13 C-PCS data yields the wrong stereospecific assignment. The 1 and 2 angles of these<br />

residues are –47º and 172º, respectively, in the crystal structure (Hamdan et al., 2002b). Adjusting<br />

these angles to –60º and 180º, respectively, inverts the relative size of the 13 C PCS predicted for the<br />

two methyl groups, leads to much better agreement between predicted and experimental PCS, and<br />

results in the correct stereospecific assignments (Figure 2.6c and d). This observation is most<br />

simply explained by a small difference between the crystal and solution structures. Note that the


48 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

correct assignment could have been obtained for Leu95 in the Yb 3+ complex (Figure 2.6b). None of<br />

the other Val and Leu residues swapped their stereospecific assignment when we changed their 1<br />

angles ( 2 angles in the case of Leu) by ±10º.<br />

In the case of Leu11, different <strong>NMR</strong> criteria suggest that its side chain undergoes dynamic<br />

conformational averaging. (i) The 13 C-PCS values predicted from the structure are -7.0 and -11.9<br />

ppm, whereas the experimental value found for both methyl groups is -8.4 ppm. (ii) In the crystal<br />

structure, the two methyl groups are 9.2 and 11.1 Å from the metal ion, but the 13 C-<strong>NMR</strong> line<br />

widths observed for the methyl groups in the cz- 186/ /Dy 3+ complex are indistinguishable. (iii)<br />

Both methyl resonances overlap with each other in the 13 C-HSQC spectrum, indicating similar<br />

chemical environments, and their line shapes are narrower than those of most other methyl groups.<br />

Remarkably, however, Leu11 is located in the hydrophobic core of the protein and is very well<br />

defined in the crystal structure, 40 although the side chain forms no steric contacts and could access<br />

different rotameric states without introducing van der Waals violations with neighboring atoms.<br />

Conceivably, the low temperature used in the X-ray experiment may have frozen out a single<br />

conformation, whereas a much larger conformational space is accessible at room temperature.<br />

Ile154 presents an example where partial motional averaging may be indicated by a smaller<br />

difference observed between the PCS of the 1 and 2 carbon atoms than predicted. The side chain<br />

heavy atoms of this residue shows enhanced B-factors in the crystal structure, in agreement with its<br />

location at the protein surface.<br />

2.5 Discussion<br />

The present work shows that methyl resonances of 13 C-labeled proteins can be assigned<br />

solely from PCS with reference to the 3D structure of the protein, yielding both sequence- and<br />

stereo-specific resonance assignments without having to establish connectivities to backbone<br />

resonances. This presents a significant advance over our previous strategy for the assignment of<br />

15 N-HSQC spectra, which relied on PCS, PRE, CCR, and RDCs measured on selectively labeled<br />

samples (Pintacuda et al., 2004).<br />

Clearly, any resonance assignment based on comparison of experimental and back-<br />

calculated PCS critically depends on the accuracy of the 3D structure of the protein and is expected<br />

to fail for flexible protein segments. Yet, this problem is much less severe than in the case of RDCs


2.5 Discussion. 49<br />

(Sibille et al., 2002), since PCS are far less affected by local mobility as long as the spins are not<br />

very close to the paramagnetic center. The robustness of PCS with regard to structural variations is<br />

particularly beneficial for the assignment of Met CH3 groups that are notoriously difficult to<br />

assign by conventional methods. The potential of PCS for their assignment has been noted<br />

previously (Bose-Basu et al., 2004).<br />

The assignment strategy presented here requires the determination of the tensor, which<br />

can readily be achieved from 15 N- 1 H correlation spectra by the Platypus algorithm (Pintacuda et al.,<br />

2004). Obtaining resonance assignments of methyl groups in this way is attractive because 15 N- 1 H<br />

correlation spectra of backbone amides and 13 C- 1 H correlation spectra of methyl groups can be<br />

recorded even for high-molecular weight systems (Fiaux et al., 2002, Sprangers et al., 2007).<br />

Alternatively, the -tensor parameters can be determined from assigned diamagnetic <strong>NMR</strong><br />

resonances and a set of PCS identified by comparison with the paramagnetic <strong>NMR</strong> spectrum, either<br />

manually or automatically using the Echidna algorithm (Schmitz et al., 2006). Initial sequence-<br />

specific resonance assignments can, if necessary, be achieved by site-directed mutagenesis (Siivari<br />

et al., 1995, Bose-Basu et al., 2004), for example by mutation of Ile to Val (Wu et al., 2007).<br />

Assignments by PCS are not limited to metal-binding proteins as different techniques have<br />

recently become available that achieve site-specific attachment of lanthanide-tags to proteins<br />

devoid of natural metal binding sites (Ma et al., 2000, Dvoretsky et al., 2002, Wöhnert et al., 2003,<br />

Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et al., 2005, Haberz et al., 2006, Rodriguez-<br />

Castañeda et al., 2006, Su et al., 2006). The use of different tags or attachment at different sites<br />

readily generates very different tensors (Rodriguez-Castañeda et al., 2006) that can highlight<br />

inconsistencies between experimental and back-calculated PCS.<br />

If the exchange between paramagnetic and diamagnetic metal ions is too slow to measure<br />

exchange spectra, the program Possum can be used to assign the methyl groups in the diamagnetic<br />

and paramagnetic state. As expected, the robustness of Possum with regard to small differences<br />

between the atomic coordinates of the protein and its actual structure in solution increases with the<br />

amount of additional data available. In this respect, data from two paramagnetic metal ions are<br />

particularly beneficial, but also information about intraresidual methyl-methyl connectivities or<br />

stereospecific identities of methyl groups in Val, Leu, and Ile residues.<br />

The robustness of assignments made by Possum can further be enhanced by the increased<br />

spectral resolution afforded by 3D <strong>NMR</strong> spectra which would greatly facilitate the identification of<br />

the corresponding <strong>NMR</strong> resonances in the diamagnetic and paramagnetic state based on the


50 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

criterion that all correlated spins are close in space and therefore experience similar PCS. For<br />

example, 3D (H)CCH-TOCSY or NOESY- 13 C-HSQC spectra would resolve several cross-peaks<br />

for each methyl group, which can simultaneously be compared with the 3D structure of the protein<br />

and the predicted PCS to obtain resonance assignments. For methyl groups in the vicinity of the<br />

paramagnetic ion, the observation of correlations can be aided by protonless experiments (Bermel<br />

et al., 2006).<br />

Conceivably, assignments by PCS can also be achieved for perdeuterated proteins of<br />

increased molecular weight containing selectively protonated methyl groups (Rosen et al., 1996).<br />

The best spectral resolution in the methyl region of the 13 C-HSQC spectrum would be obtained for<br />

CD2H groups (Kainosho et al., 2006).Notably, however, the Cz-EXSY experiments described here<br />

allowed us to measure all PCS data in the uniformly 13 C/ 15 N-labeled and fully protonated sample,<br />

i.e. the improved spectral resolution of selectively labeled samples was not necessary for our<br />

system.<br />

In conclusion, resonance assignments of the 13 C-HSQC cross-peaks of methyl groups by<br />

PCS induced by a site-specifically attached lanthanide ion present a versatile and convenient<br />

technique which can open many opportunities for <strong>NMR</strong> studies of proteins of known three-<br />

dimensional structure. It is anticipated that resonance assignments by this technique will be<br />

particularly useful in ligand screening applications.<br />

2.6 Acknowledgement<br />

The authors thank Don A. Grundel for source codes of the MAP solver and for useful<br />

discussions. M.J. thanks the Humboldt Foundation for a Feodor-Lynen Fellowship. Financial<br />

support from the Australian <strong>Research</strong> Council for project grants, a Federation Fellowship for G.O.<br />

and the 800 MHz <strong>NMR</strong> spectrometer at the ANU is gratefully acknowledged. This work was<br />

supported by an award under the Merit Allocation Scheme of the National Facility of the Australian<br />

Partnership for Advanced Computing.<br />

2.7 Supporting Information Available


2.8 References. 51<br />

Pulse scheme of a (H)C(C)H-TOCSY experiment for correlations between isopropyl methyl<br />

groups, 13 C-HSQC spectra of uniformly, fractionally, and selectively isotope labeled cz- 186/ ,<br />

diagrams comparing experimental and predicted PCS, a table with the chemical shifts of the methyl<br />

groups cz- 186 observed in the presence of La 3+ , Yb 3+ , or Dy 3+ , and tables reporting the number of<br />

methyl groups assigned by Possum. This material is available free of charge via the Internet at<br />

http://pubs.acs.org.<br />

2.8 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

Å from the metal ion. J Am Chem Soc 122:4154-4161<br />

Balas E and Saltzman MJ (1991) An algorithm for the 3-index assignment problem. Oper Res<br />

39:150-161<br />

Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />

complete aminoacid identification in thulium substituted Calbindin D9K. J Biomol <strong>NMR</strong><br />

34:63-73<br />

Bax A, Delaglio F, Grzesiek S and Vuister GW (1994) Resonance assignment of methionine<br />

methyl groups and χ 3 angular information from long-range proton-carbon and carbon-<br />

carbon J correlation in a calmodulin-peptide complex. J Biomol <strong>NMR</strong> 4:787-797<br />

Bax A, Ikura M, Kay LE and Zhu G (1991) Removal of F1 baseline distortion and optimization of<br />

folding in multidimensional <strong>NMR</strong> spectra. J Magn Reson 91:174-178<br />

Bermel W, Bertini I, Felli IC, Piccioli M and Pierattelli R (2006) 13 C-detected protonless <strong>NMR</strong><br />

spectroscopy of proteins in solution. Prog <strong>NMR</strong> Spectrosc 48:25-45<br />

Bose-Basu B, DeRose EF, Kirby TW, Mueller GA, Beard WA, Wilson SH and London RE (2004)<br />

Dynamic characterization of a DNA repair enzyme: <strong>NMR</strong> studies of [methyl-<br />

13 C]methionine-labeled DNA polymerase β. Biochemistry 43:8911-8922<br />

DeRose EF, Darden T, Harvey S, Gabel S, Perrino FW, Schaaper RM and London RE (2003)<br />

Elucidation of the ε- θ subunit interface of Escherichia coli DNA polymerase III by <strong>NMR</strong><br />

spectroscopy. Biochemistry 42:3635-3644<br />

Dvoretsky A, Gaponenko V and Rosevear PR (2002) Derivation of structural restraints using a<br />

thiol-reactive chelator. FEBS Lett 528:189-192


52 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Eaton HL, Fesik SW, Glaser SJ and Drobny GP (1990) Time dependence of 13 C- 13 C magnetization<br />

transfer in isotropic mixing experiments involving amino acid spin systems. J Magn Reson<br />

90:452-463<br />

Farrow NA, Zhang O, Forman-Kay JD and Kay LE (1994) A heteronuclear correlation experiment<br />

for simultaneous determination of 15 N longitudinal decay and chemical exchange rates of<br />

systems in slow equilibrium. J Biomol <strong>NMR</strong> 4:727-734<br />

Fiaux J, Bertelsen EB, Horwich AL and Wüthrich K (2002) <strong>NMR</strong> analysis of a 900K GroEL-<br />

GroES complex. Nature 418:207-211<br />

Fischer MWF, Zeng L and Zuiderweg ERP (1996) Use of 13 C- 13 C NOE for the assignment of <strong>NMR</strong><br />

lines of larger labeled proteins at larger magnetic fields. J Am Chem Soc 118:12457-12458<br />

Grishaev A and Llinás M (2002) CLOUDS, a protocol for deriving a molecular proton density via<br />

<strong>NMR</strong>. Proc Natl Acad Sci U S A 99:6707-6712<br />

Gross JD, Gelev VM and Wagner G (2003) A sensitive and robust method for obtaining<br />

intermolecular NOEs between side chains in large protein complexes. J Biomol <strong>NMR</strong><br />

25:235-242<br />

Haberz P, Rodriguez-Castañeda F, Junker J, Becker S, Leonov A and Griesinger C (2006) Two<br />

new chiral EDTA-based metal chelates for weak alignment of proteins in solution. Org Lett<br />

8:1275-1278<br />

Hajduk PJ, Augeri DJ, Mack J, Mendoza R, Yang J, Betz SF and Fesik SW (2000) <strong>NMR</strong>-based<br />

screening of proteins containing 13 C-labeled methyl groups. J Am Chem Soc 122:7898-<br />

7904<br />

Hamdan S, Bulloch EM, Thompson PR, Beck JL, Yang JY, Crowther JA, Lilley PE, Carr PD, Ollis<br />

DL, Brown SE and Dixon NE (2002a) Hydrolysis of the 5 '-p-nitrophenyl ester of TMP by<br />

the proofreading exonuclease (ε) subunit of Escherichia coli DNA polymerase III.<br />

Biochemistry 41:5266-5275<br />

Hamdan S, Carr PD, Brown SE, Ollis DL and Dixon NE (2002b) Structural basis for proofreading<br />

during replication of the Escherichia coli chromosome. Structure 10:535-546<br />

Ikegami T, Verdier L, Sakhaii P, Grimme S, Pescatore B, Saxena K, Fiebig KM and Griesinger C<br />

(2004) Novel techniques for weak alignment of proteins in solution using chemical tags<br />

coordinating lanthanide ions. J Biomol <strong>NMR</strong> 29:339-349<br />

Janin J, Miller S and Chothia C (1988) Surface, subunit interfaces and interior of oligomeric<br />

proteins. J Mol Biol 204:155-164<br />

John M, Headlam MJ, Dixon NE and Otting G (2007a) Assignment of paramagnetic 15 N-HSQC<br />

spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51


2.8 References. 53<br />

John M, Park AY, Dixon NE and Otting G (2007b) <strong>NMR</strong> detection of protein 15 N spins near<br />

paramagnetic lanthanide ions. J Am Chem Soc 129:462-463<br />

John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />

proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />

shifts. J Am Chem Soc 127:17190-17191<br />

Jung YS and Zweckstetter M (2004) Backbone assignment of proteins with known structure using<br />

residual dipolar couplings. J Biomol <strong>NMR</strong> 30:25-35<br />

Kaikkonen A and Otting G (2001) Residual dipolar 1 H- 1 H couplings of methyl groups in weakly<br />

aligned proteins. J Am Chem Soc 123:1770-1771<br />

Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Ono AM and Güntert P (2006) Optimal isotope<br />

labelling for <strong>NMR</strong> protein structure determinations. Nature 440:52-57<br />

Karimi-Nejad Y, Schmidt JM, Rüterjans H, Schwalbe H and Griesinger C (1994) Conformations of<br />

valine side chains in ribonuclease T1 determined by <strong>NMR</strong> studies of homonuclear and<br />

heteronuclear 3 J coupling constants. Biochemistry 33:5481-5492<br />

Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />

Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />

Korzhnev DM, Kloiber K, Kanelis V, Tugarinov V and Kay LE (2004) Probing slow dynamics in<br />

high molecular weight proteins by methyl-TROSY <strong>NMR</strong> spectroscopy: Application to a<br />

723-residue enzyme. J Am Chem Soc 126:3964-3973<br />

Leonov A, Voigt B, Rodriguez-Castañeda F, Sakhaii P and Griesinger C (2005) Convenient<br />

synthesis of multifunctional EDTA-based chiral metal chelates substituted with an S-<br />

mesylcysteine. Chem Eur J 11:3342-3348<br />

Liu W, Zheng Y, Cistola DP and Yang D (2003) Measurement of methyl 13 C- 1 H cross-correlation<br />

in uniformly 13 C-, 15 N-, labeled proteins. J Biomol <strong>NMR</strong> 27:351-364<br />

Ma C and Opella SJ (2000) Lanthanide ions bind specifically to an added "EF-hand" and orient a<br />

membrane protein in micelles for solution <strong>NMR</strong> spectroscopy. J Magn Reson 146:381-384<br />

Montelione GT, Lyons BA, Emerson SD and Tashiro M (1992) An efficient triple resonance<br />

experiment using carbon-13 isotropic mixing for determining sequence-specific resonance<br />

assignments of isotopically-enriched proteins. J Am Chem Soc 114:10974-10975<br />

Muhandiram DR, Yamazaki T, Sykes BD and Kay LE (1995) Measurement of 2 H T1 and T1ρ<br />

relaxation times in uniformly 13 C-labeled and fractionally 2 H-labeled proteins in solution. J<br />

Am Chem Soc 117:11536-11544<br />

Neri D, Szyperski T, Otting G, Senn H and Wüthrich K (1989) Stereospecific nuclear magnetic<br />

resonance assignments of the methyl groups of valine and leucine in the DNA-binding


54 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

domain of the 434 repressor by biosynthetically directed fractional 13 C labeling.<br />

Biochemistry 28:7510-7516<br />

Nicholson LK, Kay LE, Baldisseri DM, Arango J, Young PE, Bax A and Torchia DA (1992)<br />

Dynamics of methyl groups in proteins as studied by proton-detected 13 C <strong>NMR</strong><br />

spectroscopy. Application to the leucine residues of staphylococcal nuclease. Biochemistry<br />

31:5253-5263<br />

Ostler G, Soteriou A, Moody CM, Khan JA, Birdsall B, Carr MD, Young DW and Feeney J (1993)<br />

Stereospecific assignments of the leucine methyl resonances in the 1 H <strong>NMR</strong> spectrum of<br />

Lactobacillus casei dihydrofolate reductase. FEBS Lett 318:177-180<br />

Park AY (2006) Ph.D. <strong>Thesis</strong>. Australian National University, Australia.<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc Chem Res 40:206-212<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Prudêncio M, Rohovec J, Peters JA, Tocheva E, Boulanger MJ, Murphy MEP, Hupkes HJ, Kosters<br />

W, Impagliazzo A and Ubbink M (2004) A caged lanthanide complex as a paramagnetic<br />

shift agent for protein <strong>NMR</strong>. Chem Eur J 10:3252-3260<br />

Rodriguez-Castañeda F, Haberz P, Leonov A and Griesinger C (2006) Paramagnetic tagging of<br />

diamagnetic proteins for solution <strong>NMR</strong>. Magn Reson Chem 44:S10-S16<br />

Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T and Kay LE (1996) Selective methyl<br />

group protonation of perdeuterated proteins. J Mol Biol 263:627-636<br />

Sattler M, Schwalbe H and Griesinger C (1992) Stereospecific assignment of leucine methyl groups<br />

with 13 C in natural abundance or with random 13 C labeling. J Am Chem Soc 114:1126-1127<br />

Schell E (1955) Distribution of a product over several properties. E 2nd Sym. Linear Program 615–<br />

642<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Senn H, Werner B, Messerle BA, Weber C, Traber R and Wüthrich K (1989) Stereospecific<br />

assignment of the methyl 1 H <strong>NMR</strong> lines of valine and leucine in polypeptides by<br />

nonrandom 13 C labelling. FEBS Lett 249:113-118


2.8 References. 55<br />

Senn H and Wüthrich K (1985) Amino-acid-sequence, hem-hiron coordination geometry and<br />

functional-properties of mitochondrial and bacterial c-type cytochromes. Quart Rev<br />

Biophys 18:111-134<br />

Sibille N, Bersch B, Covès J, Blackledge M and Brutscher B (2002) Side chain orientation from<br />

methyl 1 H- 1 H residual dipolar couplings measured in highly deuterated proteins. J Am<br />

Chem Soc 124:14616-14625<br />

Siivari K, Zhang M, Palmer AG and Vogel HJ (1995) <strong>NMR</strong> studies of the methionine methyl<br />

groups in calmodulin. FEBS Lett 366:104-108<br />

Sprangers R and Kay LE (2007) Quantitative dynamics and binding studies of the 20S proteasome<br />

by <strong>NMR</strong>. Nature 445:618-622<br />

Su XC, Huber T, Dixon NE and Otting G (2006) Site-specific labelling of proteins with a rigid<br />

lanthanide-binding tag. Chembiochem 7:1599-1604<br />

Tang C, Iwahara J and Clore GM (2005) Accurate determination of leucine and valine side-chain<br />

conformations using U-[ 15 N/ 13 C/ 2 H]/[ 1 H-(methine/methyl)-Leu/Val] isotope labeling, NOE<br />

pattern recognition, and methine Cγ-Hγ /Cβ-Hβ residual dipolar couplings: application to<br />

the 34-kDa enzyme IIA Chitobiose . J Biomol <strong>NMR</strong> 33:105-121<br />

Tugarinov V and Kay LE (2003a) Ile, Leu, and Val methyl assignments of the 723-residue malate<br />

synthase G using a new labeling strategy and novel <strong>NMR</strong> methods. J Am Chem Soc<br />

125:13868-13878<br />

Tugarinov V and Kay LE (2003b) Side chain assignments of Ile δ1 methyl groups in high<br />

molecular weight proteins: An application to a 46 ns tumbling molecule. J Am Chem Soc<br />

125:5701-5706<br />

Tugarinov V and Kay LE (2004) Stereospecific <strong>NMR</strong> assignments of prochiral methyls, rotameric<br />

states and dynamics of valine residues in malate synthase G. J Am Chem Soc 126:9827-<br />

9836<br />

Tugarinov V and Kay LE (2005a) Methyl groups as probes of structure and dynamics in <strong>NMR</strong><br />

studies of high-molecular-weight proteins. Chembiochem 6:1567-+<br />

Tugarinov V, Ollerenshaw JE and Kay LE (2005b) Probing side-chain dynamics in high molecular<br />

weight proteins by deuterium <strong>NMR</strong> spin relaxation: An application to an 82-kDa enzyme. J<br />

Am Chem Soc 127:8214-8225<br />

Wand AJ, Urbauer JL, McEvoy RP and Bieber RJ (1996) Internal dynamics of human ubiquitin<br />

revealed by 13 C-relaxation studies of randomly fractionally labeled protein. Biochemistry<br />

35:6116-6125


56 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Williams NK, Prosselkov P, Liepinsh E, Line I, Sharipo A, Littler DR, Curmi PMG, Otting G and<br />

Dixon NE (2002) In vivo protein cyclization promoted by a circularly permuted<br />

Synechocystis sp. PCC6803 DnaB mini-intein. J Biol Chem 277:7790-7798<br />

Wöhnert J, Franz KJ, Nitz M, Imperiali B and Schwalbe H (2003) Protein alignment by a<br />

coexpressed lanthanide-binding tag for the measurement of residual dipolar couplings. J Am<br />

Chem Soc 125:13338-13339<br />

Wu PSC, Ozawa K, Lim SP, Vasudevan SG, Dixon NE and Otting G (2007) Cell-free<br />

transcription/translation from PCR-amplified DNA for high-throughput <strong>NMR</strong> studies.<br />

Angew Chem, Int Ed 46:3356-3358<br />

Zuiderweg ERP, Boelens R and Kaptein R (1985) Stereospecific assignments of 1 H-<strong>NMR</strong> methyl<br />

lines and conformation of valyl residues in the lac repressor headpiece. Biopolymers<br />

24:601-611<br />

Zwahlen C, Gardner KH, Sarma SP, Horita DA, Byrd RA and Kay LE (1998) An <strong>NMR</strong> experiment<br />

for measuring methyl-methyl NOEs in 13 C-labeled proteins with high resolution. J Am<br />

Chem Soc 120:7617-7625<br />

2.9 Supporting information<br />

Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment<br />

used in this study. Parameters are as for the pulse schemes of Figure<br />

2.1. Efficient magnetization transfer between the methyl groups of<br />

isopropyl groups was obtained by applying DIPSI3 mixing for 12 ms<br />

with a radiofrequency amplitude of 8.6 kHz. The Bruker pulse<br />

programs of this pulse sequence and of the pulse sequences of Figure<br />

2.1 can be downloaded from http://rsc.anu.edu.au/~go/.


2.9 Supporting information. 57<br />

Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C. Only the region containing the methyl cross-peaks is<br />

shown. Cross-peaks from methyl groups of Val, Leu, Ile, Ala and Thr appear as positive peaks<br />

(blue), whereas cross-peaks from Met CH3 and all CH2 groups appear as negative peaks (red).


58 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />

186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled glucose.<br />

Parameters and plot region as in Figure S2.2. Cross-peaks from Val 1, Leu 1, and Ala methyl<br />

groups are positive (blue). Cross-peaks from Val 2, Leu 2, Thr 2 and Met methyl groups are<br />

negative (red). Cross-peaks from Ile 1 and 2 methyl groups are mostly invisible due to<br />

scrambling of 13 C during Ile biosynthesis.


2.9 Supporting information. 59<br />

Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex containing<br />

13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-TOCSY spectrum of the<br />

same sample (red). The assignments of the 13 C-HSQC cross-peaks are indicated. The three mobile<br />

residues Leu11, Leu43 and Leu145 also show one-bond correlations between CH3 and CH<br />

groups.


60 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />

methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile. The distances rC-Ln are<br />

indicated in Å at the top of each plot. For residues with two methyl groups, the distance value<br />

shown at the top refers to the C 1 (Val), C 1 (Leu), or C 1 (Ile) atom.


Figure S2.5 continued<br />

2.9 Supporting information. 61


62 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />

the cz- 186/ /Yb 3+ complex.


Figure S2.6 continued<br />

2.9 Supporting information. 63


64 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />

complexes used in this study a<br />

<strong>Group</strong> rC-Ln cz- 186/ /La 3+ cz- 186/ /Dy 3+ cz- 186/ /Yb 3+<br />

Methionine<br />

(Å)<br />

13 C<br />

1 H<br />

M18 12.0 17.78 2.06 18.04 2.34<br />

M87 22.5 15.97 1.53 15.32 0.93 16.05 1.63<br />

M107 14.2 16.23 2.04 12.84 16.77 2.64<br />

M137 20.7 16.69 2.05 18.00 3.38 16.41 1.78<br />

M178 15.4 15.39 2.22 18.67 5.53 14.69 1.53<br />

M185 16.83 2.07 17.32 2.55 16.75 1.98<br />

Alanine<br />

A4 19.14 1.35<br />

13 C<br />

A23 20.2 17.15 0.62 17.35<br />

A35 11.5 24.14 1.57 17.70 25.47 2.79<br />

A62 10.9 18.72 1.55 30.88 16.97 -0.24<br />

A69 16.2 23.77 1.61 26.74 23.15<br />

A80 22.3 18.68 1.46 19.04 1.84 18.66 1.39<br />

A83 20.1 19.35 1.29 18.97 1.00<br />

A93 19.2 20.74 1.21 19.78 0.22 20.90 1.39<br />

A100 13.3 19.85 1.41 16.19 21.22 1.77<br />

A101 15.1 18.20 1.45 15.48 -1.16 18.58 1.82<br />

A132 17.6 17.89 1.60 17.16<br />

A134 15.8 18.44 1.79 20.16 3.58 17.96 1.30<br />

A147 14.8 18.49 1.28 20.54 18.01 0.86<br />

A150 17.2 17.57 1.42 20.22 3.91 17.12 0.99<br />

A164 5.2 18.93 1.21<br />

A168 7.5 17.24 1.42 18.52<br />

A172 12.5 18.54 1.38 24.10 17.77 0.73<br />

A177 18.5 16.75 0.92 20.11 4.39 16.17 0.34<br />

A186 19.07 1.39 19.45 1.77 19.02 1.32<br />

1 H<br />

13 C<br />

1 H


Threonine<br />

T3 2<br />

T6 2 21.50 1.20 21.53 1.25<br />

T13 2 8.0<br />

T15 2 9.0 19.45 -0.13 34.77 17.36<br />

2.9 Supporting information. 65<br />

T16 2 13.8 23.64 1.28 31.65 22.35 0.04<br />

T44 2 21.0 22.65 1.34 22.64 1.35 22.67 1.39<br />

T78 2 21.1 22.40 1.42 23.44 2.48 22.19 1.21<br />

T121 2 17.4 21.07 0.66 19.34 -1.07 21.31 0.83<br />

T123 2 25.0 21.48 1.09 20.80 0.50 21.66 1.26<br />

T128 2 16.9 21.93 1.10<br />

T160 2 12.9<br />

T179 2 19.6 20.56 0.66 22.15 2.28 20.25 0.35<br />

T183 2 21.44 1.21 22.16 1.91 21.33 1.08<br />

T187 2 21.57 1.19 21.88 1.51 21.50 1.13<br />

Valine<br />

V10 1 10.4 22.52 0.91 28.21 21.70 -0.07<br />

V10 2 12.8 19.56 0.66 22.13 19.19 0.31<br />

V36 1 11.2 21.16 0.75<br />

V36 2 12.3 21.09 0.67 21.97<br />

V38 1 18.5 22.09 0.83 23.20 2.03 21.92 0.67<br />

V38 2 16.2 20.58 0.82 22.08 2.42 20.43 0.64<br />

V39 1 24.1 21.08 0.89 21.14 1.00<br />

V39 2 22.0 22.03 0.95 22.06 1.01<br />

V50 1 14.9 22.32 0.58 19.95 22.70 0.94<br />

V50 2 13.2 20.00 0.68 17.80 20.26 0.97<br />

V58 1 12.7 21.64 1.29 30.67 20.19 -0.24<br />

V58 2 13.5 22.31 1.18 30.02 21.00 -0.16<br />

V65 1 6.4 20.72 0.90 22.51<br />

V65 2 8.8 21.14 0.93 21.74 1.65


66 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

V82 1 17.7 20.81 0.82 20.09 0.08<br />

V82 2 16.6 20.37 0.72 19.79 0.10<br />

V96 1 13.3 19.60 0.23 21.73 19.08 -0.32<br />

V96 2 15.0 19.04 0.29 19.83 1.04 18.87 0.13<br />

V127 1 15.8 21.62 0.78 19.64 -1.32 21.84 1.05<br />

V127 2 17.2 20.59 0.72 19.02 -0.93 20.89 0.95<br />

V133 1 17.9 21.22 0.97 22.43 2.24 20.92 0.67<br />

V133 2 18.4 21.99 1.16 22.69 1.76 21.77 0.95<br />

V174 1 13.3 22.88 0.95 31.40 21.37 -0.49<br />

V174 2 12.5 24.96 0.97 35.55 23.21 -0.83<br />

Leucine<br />

L11 1 11.1 27.14 1.00 18.89 28.30 2.40<br />

L11 2 9.2 26.90 0.99 18.65 28.30 2.39<br />

L43 1 16.1 25.49 0.96 27.78 3.35 25.24 0.69<br />

L43 2 15.6 25.52 0.94 27.63 3.15 25.26 0.72<br />

L52 1 13.6 26.18 0.65 25.88 0.34<br />

L52 2 15.0 24.99 0.56 25.93 24.81 0.39<br />

L57 1 21.4 25.94 0.72 28.07 2.74 25.61 0.39<br />

L57 2 20.4 21.82 0.85 24.23 3.21 21.40 0.44<br />

L73 1 13.8 25.49 0.96 24.36 -0.30<br />

L73 2 13.4 21.30 0.97 25.98 20.34 -0.02<br />

L74 1 22.4 26.03 0.99 27.27 2.17 25.78 0.75<br />

L74 2 21.8 22.44 0.85 23.95 2.39 22.12 0.55<br />

L95 1 15.2 25.21 0.75 22.60 25.64 1.20<br />

L95 2 14.6 23.20 0.78 20.84 -1.73 23.72 1.21<br />

L113 1 20.2 24.42 0.34 25.63 1.59 24.22 0.15<br />

L113 2 22.3 21.37 0.64 22.17 1.45 21.27 0.53<br />

L114 1 21.6 26.00 1.22 26.00 1.24<br />

L114 2 22.0 22.35 1.01 22.33 0.95<br />

L131 1 13.3 22.74 0.79 21.14 22.69 0.70<br />

L131 2 12.5 25.78 0.71 21.89 26.09 1.07


L145 1 8.9 23.96 0.70 16.25<br />

L145 2 6.5 24.01 0.62 14.34<br />

2.9 Supporting information. 67<br />

L148 1 13.2 26.21 0.92 31.96 25.03 -0.32<br />

L148 2 15.4 23.33 1.14 27.21 22.59 0.42<br />

L161 1 11.3 24.89 0.72 23.66 25.39 1.18<br />

L161 2 10.6 21.78 0.83 19.26 22.39 1.42<br />

L165 1 10.1 24.05 0.89 17.02 25.60 2.34<br />

L165 2 11.0 26.53 1.03 19.74 27.83 2.26<br />

L166 1 8.8 22.89 0.74<br />

L166 2 7.7 24.99 0.86 26.66<br />

L171 1 10.1 21.06 0.82 36.23 18.23 -2.00<br />

L171 2 8.9 26.65 0.99 23.97<br />

L176 1 17.6 24.89 0.72 27.48 3.25 24.51 0.36<br />

L176 2 18.5 21.98 0.70 24.08 2.64 21.66 0.40<br />

Isoleucine<br />

I5 2 17.94 0.95 17.86 0.87 17.96 0.97<br />

I5 1 12.76 0.82 12.66 0.71<br />

I9 2 13.9 18.78 0.77 16.02 19.39 1.35<br />

I9 1 16.6 10.64 0.31 9.09 -1.20 11.00 0.64<br />

I21 2 23.8 17.24 0.85 17.29 0.91 17.24 0.87<br />

I21 1 24.3 12.79 0.85 12.86 0.93<br />

I30 2 10.4 17.96 0.82 23.37 17.09 -0.06<br />

I30 1 12.0 13.35 0.64 15.00 13.15 0.55<br />

I31 2 11.9 18.45 0.75 28.85 16.71 -0.92<br />

I31 1 9.8 14.65 0.34 33.74 11.31 -2.95<br />

I33 2 10.6 17.24 0.71 8.89 18.77 2.26<br />

I33 1 12.2 12.24 0.08 9.89 12.65 0.45<br />

I68 2 11.4 19.25 0.79 28.19 17.44 -0.91<br />

I68 1 8.4 13.87 0.56 9.07<br />

I90 2 15.6 18.42 0.41 15.93 -2.28 18.93 0.92<br />

I90 1 16.7 12.90 0.53 10.88 -1.52


68 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

I97 2 9.1 18.43 1.01 8.34 19.69 2.37<br />

I97 1 11.4 14.99 0.88 9.50 15.78 1.68<br />

I104 2 16.0 17.84 0.90 15.86 -2.05 18.13 1.19<br />

I104 1 14.4 10.34 0.71 7.50 10.70 1.14<br />

I118 2 22.6 15.96 0.66 15.47 0.17 16.02 0.75<br />

I118 1 23.0 13.16 0.86 12.89 0.58 13.23 0.91<br />

I154 2 13.7 16.96 0.76 23.43 15.97 -0.23<br />

I154 1 16.3 11.78 0.73 17.48 6.40 10.94 -0.16<br />

I170 2 9.5 18.41 0.71 40.65 15.06 -2.55<br />

I170 1 8.6 12.88 0.68 32.52 10.26 -1.98<br />

I193 2 17.46 0.81 17.55 0.91 17.42 0.78<br />

I193 1 13.04 0.83 13.15 0.94 13.00 0.80<br />

a Conditions: 25 ºC, pH 7.2. The chemical shifts in the cz- 186/ /La 3+ complex were measured<br />

from 13 C-HSQC spectra of the sample containing 13 C/ 15 N labeled cz- 186 in the presence of 1<br />

equivalent La 3+ . Whenever possible, chemical shifts of the cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+<br />

complexes were measured from 13 C-HSQC spectra of samples prepared with 1:1 mixtures of La 3+<br />

and Dy 3+ , or La 3+ and Yb 3+ , respectively. 13 C chemical shifts of methyl groups for which no 1 H<br />

chemical shift is reported were measured from the pd exchange peaks in 2D or 3D methyl Cz-EXSY<br />

spectra, whichever gave better resolution. When neither 13 C nor 1 H chemical shifts are indicated,<br />

the expected cross-peak could not be identified either because of spectral overlap (e.g. in the case<br />

of vanishing PCS) or strong PRE.


2.9 Supporting information. 69<br />

Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />

using the program Possum a<br />

a Calculations were performed using the experimental data of Table S2.1 and simulated data,<br />

where the paramagnetic chemical shifts of Table S2.1 were replaced by chemical shifts back-<br />

calculated from the crystal structure of 186 and the tensors used in the present study. Two<br />

additional sets of simulated data were generated by addition of structural noise to the PDB<br />

coordinates of 186. The structural noise followed a Gaussian distribution of 0.25 and 0.5 Å<br />

standard deviation, resulting in a Maxwell-Boltzmann distribution of atomic displacements with<br />

maxima at 0.35 and 0.7 Å, respectively. The columns marked “Dy max”, “Yb max”, and “La max”<br />

report the number of methyl groups for which data in the paramagnetic state were available to the<br />

program. (Additional peaks observed in the diamagnetic state remained unassigned.) The results<br />

are reported for calculations where the diamagnetic chemical shifts were supplemented only with<br />

data from Dy 3+ (light yellow), Yb 3+ (light blue) or both (grey). The rows marked with the % symbol


70 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

display the percentage of correctly assigned methyl groups for all three residues. The program<br />

Possum is available from http://compbio.chemistry.uq.edu.au/bmmg/christophe.


2.9 Supporting information. 71<br />

Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />

using the program Possum with methyl connectivity information in the Yb 3+ complex a


72 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

a Calculations were performed using the experimental data of Table S2.1 and simulated data as<br />

described in the footnote of Table S2.2. As each Val, Leu and Ile residue contains two methyl<br />

groups, methyl specificity and methyl connectivity information can be used as additional<br />

information to support the resonance assignment. (Methyl specificity information refers to<br />

stereospecific assignments of the methyl groups of Val and Leu and the a priori distinction of 2<br />

and 1 methyl groups of Ile. Methyl connectivity information refers to the knowledge of which peaks<br />

arise from the same residue.) The results of four different combinations are shown, with and<br />

without methyl specificity information in the paramagnetic complexes, and with and without methyl<br />

specificity information in the diamagnetic complex. It was assumed that no methyl connectivity<br />

information can be established for the Dy 3+ complex because of strong PRE. The data are<br />

presented in the same format as in Table S2.2. Assignments were counted as correct whenever a<br />

methyl cross-peak was assigned to the correct residue, disregarding the stereospecific correctness<br />

of the assignment. Note that the maximum number of assignable methyl groups reported in the<br />

column marked “La max” can vary when both Dy 3+ and Yb 3+ data are used, because Possum has<br />

the freedom not to assign every HSQC cross-peak observed for the Dy 3+ complex to a peak<br />

observed for the Yb 3+ complex. This results in a small variation of the number of residues for which<br />

the program has paramagnetic information available and can attempt an assignment of the<br />

diamagnetic data.


2.9 Supporting information. 73<br />

Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />

of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+ complex<br />

a<br />

a The data are presented as in Table S2.3.


Chapter 3<br />

Numbat: new user-friendly<br />

method built for automatic Δχ-<br />

tensor determination<br />

3. Numbat: new user-friendly method built for automatic Δχ-tensor determination


76 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.1 Abstract<br />

Pseudocontact shift (PCS) effects induced by a paramagnetic lanthanide bound to a protein<br />

have become increasingly popular in <strong>NMR</strong> spectroscopy as they yield a complementary set of<br />

orientational and long-range structural restraints. PCS are a manifestation of the χ-tensor<br />

anisotropy, the Δχ-tensor, which in turn can be determined from the PCS. Once the Δχ-tensor has<br />

been determined, PCS become powerful long-range restraints for the study of protein structure and<br />

protein-ligand complexes. Here we present the newly developed package Numbat (New User-<br />

friendly Method Built for Automatic Δχ-Tensor determination). With a Graphical User Interface<br />

(GUI) that allows a high degree of interactivity, Numbat is specifically designed for the<br />

computation of the complete set of Δχ-tensor parameters (including shape, location and orientation<br />

with respect to the protein) from a set of experimentally measured PCS and the protein structure<br />

coordinates. Use of the program is illustrated by building a model of the complex between the E.<br />

coli DNA polymerase III subunits ε186 and θ using PCS.<br />

3.2 Keywords<br />

paramagnetic <strong>NMR</strong> · pseudocontact shift · magnetic susceptibility tensor · software ·<br />

program · unique tensor representation<br />

3.3 Abbreviations<br />

α Subunit α of the E. coli polymerase III<br />

ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />

θ Subunit θ of the E. coli polymerase III<br />

CSA Chemical shielding anisotropy<br />

GUI Graphical user interface<br />

HOT The bacteriophage P1-encoded homolog of θ<br />

PCS Pseudocontact shift<br />

RACS Residual anisotropic chemical shift<br />

RDC Residual dipolar coupling<br />

UTR Unique Δχ-tensor representation


3.4 Introduction<br />

3.4 Introduction. 77<br />

Paramagnetic lanthanide ions bound to the natural metal-binding site of a metalloprotein or<br />

introduced via a lanthanide tag provide a number of paramagnetic effects that can be distance<br />

dependent (i.e. paramagnetic relaxation enhancement), orientation dependent (i.e. residual dipolar<br />

couplings, RDC), or a combination of both, like cross-correlated relaxation effects and<br />

pseudocontact shifts (PCS;(Bertini et al., 2002, Pintacuda et al., 2004)). PCS present particularly<br />

valuable structural restraints, as they are easy to measure and provide long-range information that<br />

would be difficult to obtain by other techniques. PCS originate from unpaired electron spins which<br />

lead to an anisotropic magnetic susceptibility tensor (χ-tensor). PCS restraints induced by<br />

lanthanide ions have been used to investigate structural and dynamical properties of proteins<br />

(Allegrozzi et al., 2000, Bertini et al., 2001, Bertini et al., 2004, Gaponenko et al., 2004, Jensen et<br />

al., 2006, Eichmüller et al., 2007, Wang et al., 2007) and protein-ligand complexes (John et al.,<br />

2006, Pintacuda et al., 2007).<br />

In order to apply PCS restraints, eight variables have to be determined. These comprise the<br />

lanthanide position (three Cartesian coordinates), three angles (e.g. Euler angles) that relate the<br />

molecular frame to the χ-tensor frame, and the axial and rhombic anisotropy parameters of the χ-<br />

tensor. (Since PCS depend only on the χ-tensor anisotropy Δχ rather than the absolute magnitude of<br />

the χ-tensor, it is sufficient to determine the anisotropy parameters represented by the Δχ-tensor.)<br />

Several integrated software tools are available for the determination and study of the alignment<br />

tensor using RDCs (Dosset et al., 2000, Zweckstetter et al., 2000, Valafar et al., 2004, Wei et al.,<br />

2006). For the situation where the 3D structure of the protein is known a priori, corresponding<br />

tools for the determination of the Δχ-tensor from PCS have been developed but are more limited in<br />

scope. The program Fantasia (Banci et al., 1996) and its extension Fantasian (Banci et al., 1997)<br />

can fit the magnitude and Euler angles of the Δχ-tensor using a set of experimental PCS but<br />

requires prior knowledge of the metal coordinates. The program Platypus (Pintacuda et al., 2004)<br />

can simultaneously fit the Δχ-tensor and assign the signals of 15 N-HSQC spectra of samples<br />

containing diamagnetic and paramagnetic lanthanides, but assumes that the 15 N-HSQC peaks are<br />

sufficiently well resolved such that the paramagnetic peaks can be unambiguously associated with<br />

their diamagnetic partners. The program Echidna (Schmitz et al., 2006) uses assigned diamagnetic<br />

15 N-HSQC cross-peaks of a uniformly 15 N-labelled protein to determine the magnitude and Euler<br />

angles of the Δχ- tensor and, simultaneously, the assignment of the paramagnetic 15 N-HSQC cross-<br />

peaks. It also requires prior knowledge of the approximate metal ion position. In principle, the<br />

structure refinement packages Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) with the


78 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

module PARArestraint for Xplor-NIH (Banci et al., 2004), GROMACS (Van der Spoel et al.,<br />

2005) with an implementation of orientation restraints (Hess et al., 2003) or DYANA (Güntert et<br />

al., 1997) with the module PSEUDYANA (Banci et al., 1998) could be used for Δχ-tensor<br />

determination from PCS but the protocols would be cumbersome. Considering that simultaneous<br />

determination of the Δχ-tensor and metal ion position relative to a known protein structure is a<br />

commonly required task, we set out to design a tool to achieve this in an easier and user-friendly<br />

way.<br />

While the metal coordinates of metalloproteins can be accurately determined by<br />

crystallography, the metal position must be fitted when no crystal structure is available, e.g., when<br />

the lanthanide is introduced via a lanthanide tag. None of the reported tools addresses this issue.<br />

Here we present the newly developed program Numbat (New User-friendly Method Built for<br />

Automatic Δχ-Tensor determination), which can simultaneously fit the Δχ-tensor and lanthanide<br />

coordinates using experimental PCS values and the coordinates of the protein. Furthermore, the<br />

program encompasses a number of useful tools for multiple data sets recorded with different<br />

paramagnetic lanthanides, for rigid-body docking using PCS, and for analysis and visualization of<br />

the results. Following a description of the algorithm on which the program builds and a<br />

presentation of the graphical user interface, we illustrate the use of Numbat for building the model<br />

of a complex in a rigid-body docking approach using PCS.<br />

3.5 Algorithm<br />

The Δχ-tensor can be determined and refined by the comparison between experimentally<br />

determined PCS values and PCS values back-calculated from the atomic coordinates of the<br />

molecular structure (Sherry et al., 1977, Lee et al., 1983, Emerson et al., 1990, Veitch et al., 1990,<br />

Banci et al., 1992, Capozzi et al., 1993). The pseudocontact shift of a nuclear spin i, PCSi calc , is<br />

given by (Bertini et al., 2002):<br />

(3.1)<br />

where i, i, i are the Cartesian coordinates of the nuclear spin i in the Δχ-tensor frame, ri is<br />

the distance between the spin i and the paramagnetic centre, and Δχax and Δχrh are the axial and


3.5 Algorithm. 79<br />

rhombic components of the Δχ-tensor. The orientation of the Δχ-tensor frame with respect to the<br />

protein frame can be specified, e.g., by three Euler angles α, β and γ 5 .<br />

To quantify the difference between experimental and back-calculated PCS values we define<br />

a quadratic cost c:<br />

(3.2)<br />

where PCSi exp is the experimental PCS for the spin i, and toli is its associated tolerance. The<br />

tolerance values can be used to reflect different uncertainties in the measurement of different PCS.<br />

When the lanthanide position is known, only five Δχ-tensor parameters have to be optimized. In<br />

this case, the least square fitting problem is linear, as can be seen from an alternate formulation of<br />

the PCS (Bertini et al., 2002):<br />

(3.3)<br />

where xi, yi, zi are the Cartesian coordinates of the spin i in an arbitrary frame f and Δχxx,<br />

Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in this frame. The Singular Value<br />

Decomposition (SVD) algorithm, which is commonly used to determine an alignment tensor from a<br />

set of experimental RDC (Valafar et al., 2004, Wei et al., 2006), would be a good candidate to<br />

minimize the cost c. The least square fitting, or the Simplex algorithm (Nelder et al., 1965) has<br />

been applied in previous work (Emerson et al., 1990, Capozzi et al., 1993). However the most<br />

general problem one has to solve is non-linear since the metal ion position may be unknown. We<br />

consequently chose for the non-linear least square fitting procedure in Numbat the Levenberg-<br />

Marquardt algorithm (Marquardt, 1963) as implemented in the GNU Scientific Library (Galassi et<br />

al., 2006).<br />

5 The parameters that are fitted by the software Numbat are: i, i, i, Δχax, Δχrh, α, β, and γ.


80 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.6 Program Features<br />

3.6.1 GUI<br />

The graphical user interface (GUI) of Numbat was built with the GTK+ library (Krause,<br />

2007) that is part of standard installations of recent Linux systems. Figure 3.1 shows two<br />

screenshots of the main interface of Numbat illustrating the intuitive and flexible user interface.<br />

Figure 3.1 Screenshots of Numbat main windows. (a) Graphical User Interface for the<br />

selection Structure and Data. Four PCS data sets can be loaded simultaneously under<br />

the tabs PCS1 to PCS4. The list of all atoms is displayed in the main frame and can be<br />

filtered with the Display tab to show only the atom or residue types of interest. The<br />

experimental PCS and the tolerance can be directly modified, and only atoms that are<br />

selected (see the column labelled “use?”) are taken into account in the calculations.<br />

The distance between the respective atom to the metal ion, the calculated PCS and the<br />

deviation between experimental and predicted PCS are calculated and displayed after


3.6 Program Features. 81<br />

each fitting procedure. (b) Graphical User Interface for Tensor Calculation. A Δχ-<br />

tensor can be fitted for each of the data sets PCS1 to PCS4. An additional tab<br />

(Multiple PCS) is for fitting different data sets that share the same metal-ion centre.<br />

The frame PDB selector allows the choice of the model(s) to be used from a family of<br />

conformers loaded. The Tensor search restraints frame allows the individual selection<br />

of each of the eight variables to be free, fixed or constrained between two values. The<br />

computed Δχ-tensor values are displayed with error estimates from the GSL<br />

implementation of the Levenberg-Marquardt algorithm and the corresponding unique<br />

tensor representation (“UTR”) is reported.<br />

3.6.2 Input files<br />

Numbat reads atomic coordinates from protein data bank (PDB; (Berman et al., 2000)) files.<br />

In the case of <strong>NMR</strong> structures, the entire ensemble of conformers is loaded and any subset can be<br />

selected for subsequent calculations. When optimizing the Δχ-tensor, PCS are back-calculated for<br />

each selected structure and averaged for the computation of the cost function c (equation (3.2)).<br />

PCS data can be read either in the Xplor-NIH format or in a format specific to Numbat. For test<br />

purposes, Numbat also allows the generation of PCS data (optionally with addition of Gaussian<br />

noise) for a user-specified Δχ-tensor.<br />

3.6.3 Methyl group definition<br />

The 1 H chemical shift of a rotating methyl group can be described as the average of the<br />

chemical shifts of the three 1 H spins. The selection ―methyl association‖ in the GUI allows<br />

definition of pseudoatom names for any methyl group for which the experimental PCS value is to<br />

be treated as the average of the PCS of the three 1 H nuclei. The pseudoatom names can be used to<br />

identify the experimental PCS values of methyl groups in the input file. Alternatively, the PCS<br />

values of methyl groups can be interactively entered via the user-interface.<br />

3.6.4 Optimization of the tensor parameters<br />

In order to give the user a maximum of flexibility, any subset of the eight Δχ-tensor<br />

variables can be optimized with the remaining ones fixed to user-specified values. Such a situation<br />

occurs, for example, when a protein-ligand complex is studied where the protein is tagged with a<br />

lanthanide. First, the Δχ-tensor can be determined using the PCS measured for the protein. Fitting<br />

of the position and orientation of the Δχ-tensor with respect to the ligand can subsequently be


82 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

performed with a minimal number of adjustable parameters by keeping the axial and rhombic<br />

components of the Δχ-tensor fixed at the values determined for the protein. The Δχ-tensors<br />

determined for the protein and the ligand can finally be superimposed to derive a model of the<br />

protein-ligand complex (Pintacuda et al., 2007).<br />

Numbat also offers the option of restricting the Δχ-tensor variables within user-defined<br />

boundaries. This is useful if the magnitude, position and/or orientation of the Δχ-tensor is<br />

approximately known from previous studies (Su et al., 2008). Depending on the quality and<br />

quantity of PCS measurements available, the Δχ-tensor variables (especially the lanthanide<br />

coordinates) may only reach a local minimum during the optimization procedure. Therefore the<br />

starting values of all Δχ-tensor variables used to initialize the minimiser can be changed<br />

interactively within Numbat.<br />

3.6.5 Residual Anisotropic Chemical Shifts (RACS)<br />

Paramagnetic lanthanides bound to the protein weakly align the molecule in the magnetic<br />

field resulting in an incomplete averaging of the anisotropic chemical shifts. This can affect the<br />

PCS by a shift of up to 0.2 ppm for backbone 15 N and 13 C’ spins at a magnetic field of 18.8 T (John<br />

et al., 2005). The RACS correction term Δδ RACS for 1 H N , backbone 15 N and 13 C’ spins can be<br />

calculated given the Δχ-tensor and the chemical shielding anisotropic tensor (CSA-tensor) using<br />

(John et al., 2005):<br />

(3.4)<br />

where B0 is the magnetic field, μ0 the induction constant, k the Boltzmann constant, T the<br />

temperature, ζii CSA the principal components of the CSA-tensor, cos θij the nine direction cosines<br />

between pairs of the principal axis of the Δχ-tensor and the CSA-tensor, and Δχjj the principal<br />

components of the Δχ-tensor. Numbat optionally uses the RACS correction term when generating<br />

PCS data and fitting Δχ-tensors. The orientations of the principal component axes of the nuclear<br />

CSA-tensors and the ζii CSA values for 1 H N , backone 15 N and 13 C’ are taken from (Cornilescu et al.,<br />

2000).<br />

3.6.6 Multiple PCS data sets


3.6 Program Features. 83<br />

A new PCS data set can be obtained by replacing one paramagnetic lanthanide with another<br />

paramagnetic lanthanide. Multiple PCS data sets obtained in this way share a conserved lanthanide<br />

position, but different orientations and magnitudes of the Δχ-tensors must be fitted to each<br />

individual PCS data set. Numbat can perform a simultaneous fit of the Δχ-tensors and the shared<br />

lanthanide position. This feature is of particular interest when only a limited number of PCS can be<br />

measured for each lanthanide ion, as fewer variables in the Δχ-tensor fit will facilitate the<br />

determination of accurate Δχ-tensor parameters. For example, a limited set of unambiguously<br />

measured PCS can be used to determine initial Δχ-tensor parameters from which the PCS of<br />

unassigned paramagnetic cross-peaks can be back-calculated, leading to assignments of additional<br />

paramagnetic cross-peaks and improved Δχ-tensor parameters. Similarly, applications to small<br />

ligand molecules with a small number of <strong>NMR</strong> signals are aided by limiting the number of<br />

adjustable variables to a minimum.<br />

3.6.7 PCS modification<br />

Once an initial Δχ-tensor has been fitted, Numbat computes and displays PCS values for all<br />

atoms. Doubtful assignments can easily be detected at this stage by inspection of the deviation<br />

between experimental and calculated values. Numbat allows interactive modification of PCSi exp and<br />

toli as well as the input of additional PCS data.<br />

3.6.8 PCS selection<br />

The experimental PCS values to be used for the Δχ-tensor fit can be selected according to<br />

three criteria: A list of (i) residue types or (ii) atom types can be provided by the user. This is<br />

convenient in the case of selectively isotope-labelled proteins and allows a quick assessment of the<br />

amount of information necessary in order to retrieve a robust Δχ-tensor. (iii) Each individual PCS<br />

can be selected or deselected interactively via the GUI interface. This is particularly convenient if,<br />

after initial optimisation of the Δχ-tensor, some of the back-calculated PCS consistently show large<br />

deviations with respect to the experimental values, which may be due to erroneous assignments or<br />

discrepancies between the atomic coordinates of the PDB file and the actual structure of the<br />

protein, as is often the case for flexible polypeptide segments. Deselecting the corresponding atoms<br />

is likely to improve the Δχ-tensor fit in the next iteration.<br />

3.6.9 Conventions


84 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Different conventions have been used in the literature to report Δχ-tensor parameters,<br />

including different definitions of Euler angles, choice of principal and secondary axis of the Δχ-<br />

tensor, and units of Δχ-tensor magnitudes. Numbat can report the Δχ-tensor parameters in many<br />

different conventions but uses as a default the following conventions: (i) The axes of the Δχ-tensor<br />

frame are labelled such that |Δχzz| ≥ |Δχyy| ≥ |Δχxx| in analogy to alignment tensor conventions<br />

(Clore et al., 1998). This ensures that axial and rhombic components are always of the same sign.<br />

(ii) The Euler angles α, β and γ are expressed in the ―ZYZ‖ convention, i.e., the first rotation of<br />

angle α is around the z axis of the protein frame, the second rotation of angle β is around the new y’<br />

axis and the last rotation of angle γ is around the new z’’ axis (Figure 3.2). While for an<br />

asymmetric object the Euler angles are uniquely defined if the angles α, β and γ are taken in the<br />

intervals [0, 2π[, [0, π[, [0, 2π[, respectively, ambiguities arise for symmetric objects. Therefore, we<br />

chose the interval [0, π[ for all three angles, eliminating the potential ambiguities arising from the<br />

four symmetry-related Δχ-tensors that generate the same PCS values. In the case of β = 0, an<br />

infinite number of combinations of and would produce the same overall rotation. In this case,<br />

we set γ = 0. These two rules ensure that any Δχ-tensor is unambiguously reported as a single set of<br />

parameters which is referred to in the GUI as UTR (Unique Δχ-Tensor Representation).<br />

Figure 3.2 Euler angle definitions used by Numbat. The relative orientation of the Δχ-<br />

tensor frame with respect to the protein frame is defined by Euler rotations of angle α, β<br />

and γ in the ZYZ convention. (a) A right-handed rotation of angle α around the z axis is<br />

applied to the protein frame xyz to give the frame x’y’z’. (b) A second rotation of angle<br />

β around the new axis z’ is applied to the frame x’y’z’ to give x’’y’’z’’. (c) The last<br />

rotation of angle γ around the z’’ axis gives the Δχ-tensor frame.<br />

3.6.10 Error analysis<br />

The Levenberg-Marquardt algorithm is used to minimize the cost c (equation (3.2)), but the<br />

quality of the fit cannot be assessed without further error analysis. Therefore, in addition to the<br />

uncertainty values provided by the GSL implementation of the minimiser, Numbat embeds a Monte


3.6 Program Features. 85<br />

Carlo protocol with random Gaussian noise added either to the atomic coordinates of the molecule<br />

or to the experimental PCS values. The robustness of the Δχ-tensor fit with respect to the PCS data<br />

set can also be tested by random subset selection of the PCS values used. Resulting Δχ-tensor<br />

orientations are displayed in a Sanson-Flamsteed projection (Bugayevskiy et al., 1995) using the<br />

plotting utility gnuplot.<br />

3.6.11 Visualization<br />

Graphical visualization of the Δχ-tensor frame and isosurfaces of PCS values in the<br />

structure of the molecule presents a convenient way to assess the similarity of the principal axes of<br />

multiple Δχ-tensors and the similarity of their respective isosurfaces. To this end Numbat interfaces<br />

with the molecular viewers MOLMOL (Koradi et al., 1996) and PyMOL (DeLano, 2002) by<br />

generating suitable macro files and displaying the Δχ-tensor frame and corresponding PCS<br />

isosurfaces in superimposition with the protein studied, as illustrated in Figure 3.3. The files of the<br />

macros, PCS potential and PDB file containing the coordinates of the protein together with<br />

coordinates of the metal ion and Δχ-tensor axes can also be saved for later use.<br />

Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of<br />

its orientational uncertainty in a Sanson-Flamsteed projection plot. Numbat can<br />

directly call MOLMOL and PyMOL to display the axes of the fitted Δχ-tensor and<br />

PCS isosurfaces at user-defined contour levels. The orientational uncertainty of the<br />

Δχ-tensor frame can be evaluated by a Monte-Carlo protocol with random additions<br />

of noise to the structure coordinates and/or PCS data, with optional random


86 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

selection of subsets of data. Numbat calls gnuplot to display the results in a Sanson-<br />

Flamsteed projection plot.<br />

3.6.12 Output<br />

The list of PCS can be saved in Xplor-NIH format and in a Numbat-specific format. The<br />

weak molecular alignment in the magnetic field resulting from a non-vanishing Δχ-tensor can be<br />

described by an alignment tensor with principal axes parallel to those of the Δχ-tensor and axial and<br />

rhombic components that are directly proportional to Δχax and Δχrh, respectively (Tolman et al.,<br />

1995). Numbat calculates the RDC between two spins A and B for the situation of a completely<br />

rigid molecule, using (Bertini et al., 2002)<br />

(3.5)<br />

where γA and γB are the magnetogyric ratios of spins A and B, respectively, ħ the Planck<br />

constant divided by 2π, S the order parameter, rAB the internuclear distance, and AB, AB, AB the<br />

coordinates of the vector AB expressed in the Δχ-tensor frame. The RDC values are reported in<br />

Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) and Pales (Zweckstetter et al., 2000)<br />

format.<br />

Finally, Numbat can generate PDB files where the Δχ-tensor is reported in a format ready<br />

for use with MOLMOL or PyMOL for rigid-body docking alignment, or for further refinement by<br />

Xplor-NIH.<br />

3.7 Study case<br />

The proteins ε and θ are subunits of the complex of proteins constituting E. coli DNA<br />

polymerase III. The complex between the N-terminal domain of ε (ε186) and θ has been<br />

extensively studied using PCS data (Pintacuda et al., 2006, Pintacuda et al., 2007). In light of the<br />

recent crystal structure of the complex between ε186 and the θ homolog HOT (Kirby et al., 2006),<br />

we illustrate in the following the features of Numbat by revisiting the <strong>NMR</strong> structure of the<br />

complex between ε186 and θ which was derived from PCS induced by Dy 3+ and Er 3+ ions bound to<br />

the natural metal-binding site of ε186 (Pintacuda et al., 2006).


3.7 Study case. 87<br />

The coordinates of the A chain in the PDB deposition 2IDO (Kirby et al., 2006) was used as<br />

the structural model for ε186. The structural model of θ was conformer 10 of the <strong>NMR</strong> structure of<br />

θ in complex with ε186 (PDB accession code 2AXD; (Keniry et al., 2006)). This conformer was<br />

chosen because it has the lowest backbone RMSD to the HOT protein (2.1 Å) for residues 9-66 (the<br />

structurally defined region for which meaningful PCS could be measured). The experimentally<br />

determined PCS values of ε186 have been reported previously (Schmitz et al., 2006) and the PCS<br />

values of θ are provided in the Supporting Information. All Δχ-tensor optimizations were performed<br />

using Numbat including the RACS correction term and a tolerance value toli of zero for all spins.<br />

3.7.1 Subunit ε186<br />

Table 3.1 presents the results of the Δχ-tensor fit to the PCS measured for ε186. Initially,<br />

individual eight-variable Δχ-tensor optimizations were performed using the PCS data of each<br />

lanthanide (Table 3.1, columns 1 and 2). Next, the Numbat GUI was updated to display the<br />

deviations between the experimental and back-calculated PCS for the Δχ-tensors found. Several<br />

atoms showed deviations > 0.15 ppm between the experimental and back-calculated PCS (15 out of<br />

199 and 8 out of 255 atoms in the case of Dy 3+ and Er 3+ , respectively. Without the RACS<br />

correction, deviations > 0.15 ppm where observed for 36 and 7 atoms, respectively). Assuming that<br />

these outliers were due to problematic measurements or inaccuracies of the 3D structure, these PCS<br />

were removed interactively using the GUI. Re-calculation of the Δχ-tensor was found not to change<br />

the fitted Δχ-tensor parameters significantly for any of the lanthanide ions (results not shown). This<br />

can be explained by the high quality and large number of experimental PCS data available for each<br />

lanthanide (backbone 13 C’, 15 N and 1 H N spins), resulting in robust fits of the Δχ-tensors.<br />

Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule<br />

ε186 a θ b<br />

Individual c Combined d Individual c Combined d Fixed e<br />

Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+<br />

Δχax f 42.3 -10.6 42.3 -10.7 40.1 -13.0 40.2 -10.0 42.3 -10.7<br />

Δχrh f 5.3 -5.1 5.3 -5.1 14.8 -6.5 14.9 -4.8 5.3 -5.1<br />

α g 169.5 144.2 169.5 143.9 27.7 23.7 27.7 19.9 42.2 34.9<br />

β g 30.2 29.1 30.2 29.2 114.6 108.8 113.9 118.2 119.2 121.5<br />

γ g 134.6 126.9 134.7 126.8 28.4 170.6 27.3 177.7 44.7 177.4<br />

mx h 29.4 29.3 29.4 29.4 6.2 9.5 6.4 6.4 4.3 4.3<br />

my h 31.9 32.0 31.9 31.9 -7.5 -7.2 -7.5 -7.5 -5.5 -5.5


88 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

mz h 26.7 26.7 26.7 26.7 -18.9 -19.0 -18.8 -18.8 -19.8 -19.8<br />

a Δχ-tensor parameters determined relative to chain A in the PDB coordinate set 2IDO<br />

b Δχ-tensor parameters determined relative to model 10 in the PDB data set 2AXD<br />

c Δχ-tensors determined from PCS induced by Dy 3+ or PCS induced by Er 3+ (individual<br />

optimization)<br />

d Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously and optimizing for<br />

a single metal ion position (combined optimization)<br />

e Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously, optimizing for a<br />

single metal ion position and fixing the Δχax and Δχrh at the values determined from the PCS data of<br />

ε186 (fixed optimization)<br />

f In units of 10 -32 m 3<br />

g Euler rotations in the ZYZ convention (degrees)<br />

h Metal ion coordinate (Å) relative to chain A in the PDB coordinate set 2IDO<br />

Since the coordinates of the Dy 3+ and Er 3+ found in the individual fits were very similar<br />

(Table 3.1, columns 1 and 2), we subsequently assumed that the Δχ-tensors induced by each<br />

lanthanide are centered at the same position relative to ε186. The results obtained by<br />

simultaneously fitting the distinct Δχ-tensors while restraining their metal coordinate to a common<br />

centre (Table 3.1, columns 3 and 4) show little difference to the Δχ-tensor parameters found when<br />

performing the individual optimizations.<br />

For comprehensive error analysis, we introduced a random error into the structure<br />

coordinates of ε186, where the atomic coordinates were varied according to a Gaussian distribution<br />

with a standard deviation ζ of 0.5 Å, resulting in a mean atom displacement of 0.8 Å. The resulting<br />

uncertainty in Δχ-tensor parameters was approximately equivalent to the uncertainty introduced by<br />

a random variation added to the measured PCS data sampled from a Gaussian distribution with a<br />

standard deviation ζ of 0.15 ppm. The Δχ-tensor parameters of ε186 were well defined, as the<br />

values of all eight Δχ-tensor variables determined by 1000 randomized pseudo-replicates of the<br />

structure were in good agreement with the Δχ-tensors fitted to the original structure (Table 3.2,<br />

column 1). To eliminate the possibility that the quality of the Δχ-tensor fit was significantly<br />

affected by the number of PCS measured, the error analysis for the Δχ-tensors fitted to ε186 was<br />

recalculated with random selection of only 20% of the measured PCS. The results (Table 3.2,<br />

column 2) show that the Δχ-tensor parameters of ε186 were still well defined. Figure 1.14.a


3.7 Study case. 89<br />

illustrates how well the Δχ-tensor axis are defined, even when randomly disregarding 50% of the<br />

data.<br />

Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a<br />

ε186 θ<br />

Structure variation Subset of PCS Structure variation Subset of PCS<br />

Δχax b 42.0 (0.8) 42.4 (1.1) 41.9 (4.3) 40.3 (3.1)<br />

Δχrh b 5.3 (0.5) 5.4 (0.8) 15.0 (4.5) 15.3 (2.8)<br />

α c 169.5 (0.7) 169.7 (0.9) 29.3 (6.1) 27.6 (3.2)<br />

β c 30.2 (0.3) 30.2 (0.5) 114.5 (4.3) 114.4 (3.3)<br />

γ c 134.0 (2.6) 134.7 (4.0) 29.2 (10.6) 28.9 (7.9)<br />

mx d 29.4 (0.1) 29.4 (0.2) 6.1 (1.3) 6.2 (0.9)<br />

my d 31.9 (0.1) 31.9 (0.2) -7.4 (1.0) -7.6 (0.7)<br />

mz d 26.7 (0.1) 26.7 (0.1) -19.1 (0.8) -18.9 (0.4)<br />

a The average values of the Δχ-tensors and their standard deviations (in brackets) are reported.<br />

Average values and standard deviations were calculated from 1000 sets of randomised atom<br />

coordinates (where the extent of randomisation followed a Gaussian distribution with a standard<br />

deviation ζ of 0.5 Å) or from randomly picked subsets of the PCS data (20% in the case of ε186 and<br />

80% in the case of θ where much fewer PCS were available)<br />

b In units of 10 -32 m 3<br />

c Euler rotations in the ZYZ convention (degrees)<br />

d Metal ion coordinate (Å) in the protein frame (A chain of the PDB coordinates 2IDO and model<br />

10 in the PDB data set 2AXD, respectively)<br />

3.7.2 Subunit θ<br />

The results of the Δχ-tensor determination in the molecular frame of θ are presented in<br />

Table 3.1. There was only a small number of spins for which the back-calculated PCS deviated<br />

from the experimental PCS by more than 0.15 ppm (4 out of 50 in the case of Dy 3+ , 0 out of 41 for


90 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Er 3+ ). Like for ε186, removal of these PCS from the optimization did not significantly change the<br />

parameters of the fitted Δχ-tensors. While the Δχax and Δχrh values of Er 3+ determined from the PCS<br />

observed for θ and 186 were very similar, the Δχrh value of the Dy 3+ tensor found for θ was almost<br />

three times larger than that found for ε186 6 . We subsequently performed an error analysis for θ as<br />

for the ε186 subunit, introducing either random variations into the atomic positions of θ according<br />

to a Gaussian distribution with a standard deviation ζ of 0.5 Å or using a random selection of only<br />

80% of the measured PCS. In either case, the Δχ-tensor parameters of θ proved to be less well<br />

defined than those of ε186 (Table 3.2) and Figure 1.14.b. As θ samples a relatively small and<br />

remote volume of the Δχ-tensors due to its spatial separation from the metal ion, one would expect<br />

a less accurate determination of the Δχ-tensors from the θ data. The effect could be exacerbated by<br />

inaccuracies of the <strong>NMR</strong> structure.<br />

In order to compensate for the smaller number of experimentally determined PCS available<br />

for θ (only 1 H N spins) and the poorer quality of the Δχ-tensors fitted, we performed another fit with<br />

Δχax and Δχrh fixed to the values determined for ε186 (Table 3.1, columns 9 and 10). Analysis of<br />

the experimental versus back-calculated PCS, both for the eight- and six-variable fits of the Δχ-<br />

tensor to θ, showed that the PCS deviations were similar in magnitude and trends. Therefore,<br />

constraining Δχax and Δχrh did not significantly deteriorate the quality of the fit, despite considerable<br />

changes of the Δχ-tensor parameters (Table 3.1). The variability of the Δχ-tensor parameters over<br />

all the 12 deposited θ conformers in 2AXD using the fixed optimisation scheme is provided in the<br />

Supporting Information.<br />

3.7.3 Modelling the complex between ε186 and θ<br />

Numbat facilitates the modelling of protein-protein complexes by listing coordinates of the<br />

Δχ-tensor axes together with the protein coordinates in files in PDB format. Superimposition of the<br />

Δχ-tensors fitted to ε186 and θ for each lanthanide ion yields the three-dimensional structure of the<br />

ε186/θ complex by straightforward rigid-body docking. Standard PyMOL or MOLMOL commands<br />

6 The discrepancy of the Rhombic component would not necessarily affect the rigid body docking<br />

of the complex, as only the orientation of the Δχ-tensors and the coordinates of the paramagnetic<br />

center are used.


3.7 Study case. 91<br />

can be used to align the Δχ-tensors. Numbat reports the coordinate system of the Δχ-tensor in such<br />

a way that all four degenerate solutions arising from the symmetry of the Δχ-tensor about the x, y<br />

and z axes (Figure 3.4) can easily be visualised. Identification of the correct solution requires<br />

additional information, such as proper steric interactions, chemical shift perturbation data or<br />

knowledge of the biological function of the complex. The most objective way, however, is by<br />

simultaneous evaluation of the Δχ-tensors of different lanthanides (Pintacuda et al., 2006).<br />

In the case of the complex between ε186 and θ, the Δχ-tensor frames of Dy 3+ and Er 3+ share<br />

a common origin for both proteins. Seven coordinates are necessary to define two Δχ-tensor frames<br />

sharing the same origin. Because of the second Δχ-tensor, the degeneracy of Figure 3.4 is broken.<br />

There are exactly 16 possibilities to align two pairs of Δχ-tensor. The lowest RMSD value resulting<br />

from all 16 possible 7-coordinate alignments between the two combined Δχ-tensors identified a<br />

single relative orientation of the two proteins as the best solution. The position of θ relative to ε186<br />

derived from PCS data in this way was also the correct solution. It agreed with a model of the<br />

complex obtained by superimposition of θ onto HOT in the ε186/HOT complex, with a backbone<br />

RMSD of 4.4 Å. Similarly for the Δχ-tensor of θ calculated with fixed Δχax and Δχrh values, a<br />

backbone RMSD of 4.3 Å was calculated relative to HOT. When PCS data from only Dy 3+ or Er 3+<br />

were used, the backbone RMSD values were, respectively, 4.2 Å and 4.4 Å for the best fit to the<br />

ε186/HOT complex. The model of the ε186/θ complex derived from the fixed, Dy 3+ and Er 3+ data<br />

sets is displayed in Figure 3.5.


92 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.8 Conclusion<br />

Figure 3.4 The four degenerate solutions arising from the symmetry of the<br />

Δχ-tensor around the x, y and z axes. All four possibilities result in the same<br />

calculi of PCS, hence in the same isosurfaces.<br />

Figure 3.5 The complex between ε186 and θ determined by superimposition<br />

of Δχ-tensors. The ε186/HOT complex (PDB accession code 2IDO) is shown<br />

for reference, with ε186 coloured in silver and HOT (residues 9-66) in<br />

orange. The isosurfaces correspond to the PCS induced by the Dy 3+ ion<br />

(from individual optimisation) contoured at +/-1.5 ppm and +/-0.5 ppm.<br />

Blue and red isosurfaces represent regions with positive and negative PCS,<br />

respectively. Residues 9-66 of θ are shown as a thin ribbon in the position<br />

defined by the fixed Dy 3+ and Er 3+ data (brown).<br />

The program Numbat is the first software package for fitting Δχ-tensors from PCS data with<br />

a user-friendly graphical user interface (GUI). Numbat calculations are fast, as it was written with<br />

open-source Linux routines in C. While the main task of Numbat is the fit of the eight Δχ-tensor


3.9 Acknowledgment. 93<br />

variables, the intuitive GUI combined with convenient data handling, including Monte-Carlo error<br />

analysis and links to the molecular viewers MOLMOL and PyMOL, offer high flexibility of use.<br />

The study case of the complex formed between the subunits ε186 and θ of E. coli DNA polymerase<br />

III illustrates the simplicity of use of Numbat.<br />

The program is freely available under the GNU General Public License (GPL) upon request<br />

(see also http://compbio.chemistry.uq.edu.au/bmmg/christophe/numbat.html).<br />

3.9 Acknowledgment<br />

Financial support from the Australian <strong>Research</strong> Council for project grants to G.O and T.H.<br />

is gratefully acknowledged.<br />

3.10 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

Å from the metal ion. J Am Chem Soc 122:4154-4161<br />

Banci L, Bertini I, Bren KL, Cremonini MA, Gray HB, Luchinat C and Turano P (1996) The use of<br />

pseudocontact shifts to refine solution structures of paramagnetic metalloproteins:<br />

Met80Ala cyano-cytochrome c as an example. J Biol Inorg Chem 1:117-126<br />

Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />

restraints for Xplor-NIH. J Biomol <strong>NMR</strong> 28:249-261<br />

Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />

PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />

torsion angle molecular dynamics. J Biomol <strong>NMR</strong> 12:553-557<br />

Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />

HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />

dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />

29:68-76


94 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Banci L, Dugad LB, La Mar GN, Keating KA, Luchinat C and Pierattelli R (1992) 1 H nuclear<br />

magnetic resonance investigation of cobalt(II) substituted carbonic anhydrase. Biophys J<br />

63:530-543<br />

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN and Bourne<br />

PE (2000) The protein data bank. Nucleic Acids Res 28:235-242<br />

Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />

Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />

domain reorientation in calmodulin. Proc Natl Acad Sci U S A 101:6841-6846<br />

Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />

Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />

Ca Ln calbindin D9k. J Biomol <strong>NMR</strong> 21:85-98<br />

Bertini I, Luchinat C and Parigi G (2002) Magnmagnetic suceptibility in paramgnetic nmr. Prog<br />

<strong>NMR</strong> Spectrosc 40:249-273<br />

Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />

London.<br />

Capozzi F, Cremonini MA, Luchinat C and Sola M (1993) Assignment of pseudo-contact-shifted<br />

1 H <strong>NMR</strong> resonances in the EF site of Yb 3+ -substituted rabbit parvalbumin through a<br />

combination of 2D techniques and magnetic susceptibility tensor determination. Magn<br />

Reson Chem 31:S118-S127<br />

Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />

the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />

structural information. J Magn Reson 133:216-221<br />

Cornilescu G and Bax A (2000) Measurement of proton, nitrogen, and carbonyl chemical shielding<br />

anisotropies in a protein dissolved in a dilute liquid crystalline phase. J Am Chem Soc<br />

122:10143-10154<br />

DeLano WL (2002) The PyMOL molecular graphics system. Palo Alto, CA, USA.<br />

Dosset P, Hus JC, Blackledge M and Marion D (2000) Efficient analysis of macromolecular<br />

rotational diffusion from heteronuclear relaxation data. J Biomol <strong>NMR</strong> 16:23-28<br />

Eichmüller C and Skrynnikov NR (2007) Observation of μs time-scale protein dynamics in the<br />

presence of Ln 3+ ions: Application to the N-terminal domain of cardiac troponin C. J<br />

Biomol <strong>NMR</strong> 37:79-95<br />

Emerson SD and La Mar GN (1990) <strong>NMR</strong> determination of the orientation of the magnetic-<br />

susceptibility tensor in cyanometmyoglobin: A new probe of steric tilt of bound ligand.<br />

Biochemistry 29:1556-1566


3.10 References. 95<br />

Galassi M, Davies J, Theiler B, Gough G, Jungman M, Booth M and Rossi F (2006). GNU<br />

scientific library reference manual. Network Theory Ltd, Bristol.<br />

Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />

of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J<br />

Biomol <strong>NMR</strong> 28:205-212<br />

Güntert P, Mumenthaler C and Wüthrich K (1997) Torsion angle dynamics for <strong>NMR</strong> structure<br />

calculation with the new program DYANA. J Mol Biol 273:283-298<br />

Hess B and Scheek RM (2003) Orientation restraints in molecular dynamics simulations using time<br />

and ensemble averaging. J Magn Reson 164:19-27<br />

Jensen MR, Hansen DF, Ayna U, Dagil R, Hass MAS, Christensen HEM and Led JJ (2006) On the<br />

use of pseudocontact shifts in the structure determination of metalloproteins. Magn Reson<br />

Chem 44:294-301<br />

John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />

proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />

shifts. J Am Chem Soc 127:17190-17191<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />

Keniry MA, Park AY, Owen EA, Hamdan SM, Pintacuda G, Otting G and Dixon NE (2006)<br />

Structure of the θ subunit of Escherichia coli DNA polymerase III in complex with the ε<br />

subunit. J Bacteriol 188:4464-4473<br />

Kirby TW, Harvey S, DeRose EF, Chalov S, Chikova AK, Perrino FW, Schaaper RM, London RE<br />

and Pedersen LC (2006) Structure of the Escherichia coli DNA polymerase III ε-HOT<br />

proofreading complex. J Biol Chem 281:38466-38471<br />

Koradi R, Billeter M and Wüthrich K (1996) MOLMOL: A program for display and analysis of<br />

macromolecular structures. J Mol Graphics 14:51-55<br />

Krause A (2007). Foundations of GTK+ development. Apress, Berkeley, CA, USA.<br />

Lee L and Sykes BD (1983) Use of lanthanide-induced nuclear magnetic-resonance shifts for<br />

determination of protein structure in solution: EF calcium binding site of carp parvalbumin.<br />

Biochemistry 22:4366-4373<br />

Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind<br />

Appl Math 11:431-441<br />

Nelder JA and Mead R (1965) A simplex method for function minimization. Comput J 7:308-313<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc Chem Res 40:206-212


96 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />

Soc 128:3696-3702<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />

structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />

Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />

structure determination package. J Magn Reson 160:65-73<br />

Sherry AD and Pascual E (1977) Proton and carbon lanthanide-induced shifts in aqueous alanine.<br />

Evidence for structural changes along lanthanide series. J Am Chem Soc 99:5871-5876<br />

Su XC, McAndrew K, Huber T and Otting G (2008) Lanthanide-binding peptides for <strong>NMR</strong><br />

measurements of residual dipolar couplings and paramagnetic effects from multiple angles.<br />

J Am Chem Soc 130:1681-1687<br />

Tolman JR, Flanagan JM, Kennedy MA and Prestegard JH (1995) Nuclear magnetic dipole<br />

interactions in field-oriented proteins: Information for structure determination in solution.<br />

Proc Natl Acad Sci U S A 92:9279-9283<br />

Valafar H and Prestegard JH (2004) REDCAT: A residual dipolar coupling analysis tool. J Magn<br />

Reson 167:228-241<br />

Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE and Berendsen HJC (2005)<br />

GROMACS: Fast, flexible, and free. J Comput Chem 26:1701-1718<br />

Veitch NC, Whitford D and Williams RJP (1990) An analysis of pseudocontact shifts and their<br />

relationship to structural features of the redox states of cytochrome b5. FEBS Lett 269:297-<br />

304<br />

Wang X, Srisailam S, Yee AA, Lemak A, Arrowsmith C, Prestegard JH and Tian F (2007)<br />

Domain-domain motions in proteins from time-modulated pseudocontact shifts. J Biomol<br />

<strong>NMR</strong> 39:53-61<br />

Wei Y and Werner MH (2006) iDC: A comprehensive toolkit for the analysis of residual dipolar<br />

couplings for macromolecular structure determination. J Biomol <strong>NMR</strong> 35:17-25


3.11 Supporting information. 97<br />

Zweckstetter M and Bax A (2000) Prediction of sterically induced alignment in a dilute liquid<br />

crystalline phase: Aid to protein structure determination by <strong>NMR</strong>. J Am Chem Soc<br />

122:3791-3792<br />

3.11 Supporting information<br />

Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a<br />

Residue PCS Dy 3+ (ppm) PCS Er 3+ (ppm)<br />

ASP 9 -1.28 0.31<br />

GLN 10 -1.19 0.3<br />

THR 11 -1.11 0.25<br />

GLU 12 -1.24 0.27<br />

MET 13 -1.84 0.38<br />

ASP 14 -2 0.32<br />

LYS 15 -1.5<br />

VAL 16 -1.97 0.13<br />

VAL 18 -1.91 0.14<br />

ASP 19 -1.32 -0.03<br />

LEU 20 -1.29 -0.03<br />

ALA 21 -1.37 -0.14<br />

ALA 22 -0.57 -0.18<br />

ALA 23 0.15 -0.38<br />

GLY 24 0.37 -0.46<br />

VAL 25 0.29 -0.26<br />

ALA 26 0.54 -0.3<br />

PHE 27 1.21<br />

LYS 28 0.85 -0.31<br />

GLU 29 0.68 -0.19<br />

ARG 30 0.81 -0.23<br />

ASN 32 0.72


98 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

MET 33 0.6<br />

VAL 35 0.02<br />

ILE 36 -0.29 0.08<br />

ALA 37 -0.12 0<br />

GLU 38 -0.22 0.04<br />

ALA 39 -0.4<br />

VAL 40 -0.55<br />

GLU 41 -0.51 -0.06<br />

ARG 42 -0.56 0.09<br />

GLU 43 -0.84 0.12<br />

GLU 46 -0.66 0.11<br />

LEU 48 -0.71 -0.01<br />

ARG 49 -0.58 0.08<br />

SER 50 -0.45 0.05<br />

TRP 51 -0.5<br />

PHE 52 -0.58<br />

ARG 53 -0.38<br />

GLU 54 -0.26 -0.01<br />

ARG 55 -0.23 -0.07<br />

LEU 56 -0.15 -0.09<br />

ILE 57 0.02 -0.09<br />

ALA 58 0.14 -0.11<br />

HIS 59 0.3 -0.18<br />

ARG 60 0.39 -0.17<br />

LEU 61 0.42 -0.15<br />

SER 63 0.8 -0.27<br />

VAL 64 0.71 -0.19<br />

ASN 65 0.8 -0.21<br />

LEU 66 -0.26<br />

a Experimental conditions as described in Pintacuda et al. (2006) J. Am. Chem. Soc. 128, 3696-<br />

3702


3.11 Supporting information. 99<br />

Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />

conformers b of the <strong>NMR</strong> structure of .<br />

Fixed a Fixed(Family) b<br />

Dy 3+ Er 3+ Dy 3+ Er 3+<br />

Δχax c 42.3 -10.7 42.3 -10.7<br />

Δχrh c 5.3 -5.1 5.3 -5.1<br />

α d 42.2 34.9 40.5 34.5<br />

β d 119.2 121.5 118.9 121.2<br />

γ d 44.7 177.4 38.1 174.8<br />

mx e 4.3 4.3 4.7 4.7<br />

my e -5.5 -5.5 -5.8 -5.8<br />

mz e -19.8 -19.8 -19.7 -19.7<br />

a The Δχ-tensor determined using the fixed optimisation scheme relative to θ conformer 10<br />

b The Δχ-tensor determined using the fixed optimisation scheme relative to simultaneously all 12<br />

deposited θ conformers<br />

c In units of 10 -32 m 3<br />

d Euler rotations in the ZYZ convention (degrees)<br />

e Metal ion coordinate (Å) in the protein frame (PDB data set 2AXD)


Chapter 4<br />

Protein Structure<br />

Determination from<br />

Pseudocontact Shifts using<br />

ROSETTA<br />

4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA<br />

Christophe Schmitz a , Robert Vernon b , Gottfried Otting c , David Baker b and Thomas Huber a<br />

a School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072,<br />

Australia<br />

b Department of Biochemistry, University of Washington, University of Washington, Seattle, WA<br />

98195<br />

c <strong>Research</strong> School of Chemistry, Australian National University, Canberra, ACT 0200, Australia<br />

Manuscript submitted to the Proceedings of the National Academy of Sciences of the United States<br />

of America.


102 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

4.1 Abstract<br />

Pseudocontact shifts (PCS) arise from paramagnetic metal ions bound to proteins and are<br />

manifested as large changes in chemical shifts detected in nuclear magnetic resonance (<strong>NMR</strong>)<br />

spectra. PCS data constitute long-range restraints on the positions of nuclear spins relative to the<br />

coordinate system of the magnetic susceptibility anisotropy tensor ( -tensor) of the metal ion.<br />

Protein structure determination using PCS data only, however, is hampered by the difficulty to<br />

determine the -tensor and metal position without knowledge of the protein structure. We have<br />

circumvented this problem in the program PCS-ROSETTA by using the structure prediction<br />

program ROSETTA to generate the models required for fitting of the -tensor parameters. PCS<br />

restraints implemented in the fragment assembly step of PCS-ROSETTA proved highly efficient in<br />

biasing the sampling of the conformational space towards the correct target structure. The results<br />

show that using a combination of chemical shift and PCS data, ROSETTA can determine structures<br />

accurately for proteins of up to 150 residues. Lanthanides can be incorporated into proteins quite<br />

generally through metal binding tags, and the combination of these data with the PCS-ROSETTA<br />

method provides a powerful new approach to protein structure determination.<br />

4.2 Introduction<br />

The three-dimensional (3D) structure of proteins is a prerequisite for understanding protein<br />

function, protein-ligand interactions and rational drug design. Protein structures can be readily<br />

determined by <strong>NMR</strong> spectroscopy. The most difficult part of an <strong>NMR</strong> structure determination<br />

typically is the assignment of sidechain chemical shifts and NOESY peaks. This bottleneck can<br />

potentially be avoided if methods for computing high accuracy structures from backbone-only<br />

<strong>NMR</strong> experiments can be developed.<br />

PCSs are a potentially rich source of structural information that are manifested as large<br />

changes in chemical shifts in the <strong>NMR</strong> spectrum caused by a non-vanishing magnetic susceptibility<br />

anisotropy tensor ( -tensor) of a paramagnetic metal ion. The PCS (in ppm) of a nuclear spin i<br />

depends on the polar coordinates ri, i, and i of the nuclear spin with respect to the -tensor<br />

frame of the metal ion and the axial and rhombic components of the -tensor:


4.2 Introduction. 103<br />

(4.1)<br />

The -tensor defines a coordinate system in the molecule that is centered on the metal ion<br />

and is fully described by eight parameters ( ax, rh, three Euler angles relating the orientation of<br />

the Δχ-tensor to the protein frame, and the coordinates of the metal ion). Therefore, the Δχ-tensor<br />

can be determined using PCS data from at least eight nuclear spins, provided the coordinates of the<br />

spins are known.<br />

As PCSs can be measured for nuclear spins 40 Å away from the metal, they present long-<br />

range structure restraints exquisitely suited to characterize the global structural arrangement of a<br />

protein. PCSs have thus been used very successfully to refine protein structures (Bertini et al.,<br />

2001, Gaponenko et al., 2004, Arnesano et al., 2005), dock protein molecules of known 3D<br />

structures (Ubbink et al., 1998, Pintacuda et al., 2006) and determine the structure of small<br />

molecules bound to a protein of known 3D structure (John et al., 2006, Pintacuda et al., 2007,<br />

Zhuang et al., 2008). The need for atom coordinates to determine the Δχ-tensor parameters,<br />

however, makes it more difficult to use PCSs in de novo determinations of protein 3D structures.<br />

All presently available protein structure determination software that uses PCS data to supplement<br />

conventional <strong>NMR</strong> restraints requires estimates of ax and rh as input parameters (Banci et al.,<br />

1998, Banci et al., 2004). These are often difficult to estimate accurately, as they depend on the<br />

chemical environment of the metal ion and the mobility of the paramagnetic center with respect to<br />

the protein.<br />

The ROSETTA structure prediction methodology (Simons et al., 1997) is well suited for<br />

taking advantages of the rich source of information inherent in PCSs. ROSETTA de novo structure<br />

prediction has two stages —first a low resolution phase in which conformational space is searched<br />

broadly using a coarse grained energy function, and second, a high resolution phase in which<br />

models generated in the first phase are refined in a physically realistic all atom force field. The<br />

bottleneck in structure prediction using ROSETTA is conformational sampling; close to native<br />

structures almost always have lower energies than non native structures. For small proteins ( < 100<br />

residues), ROSETTA has produced models with atomic level accuracy in blind prediction<br />

challenges (Raman et al., 2009). For larger proteins, however, structures close enough to the native<br />

structure to fall into the deep native energy minimum are generated seldom or not at all. This<br />

sampling problem can be overcome if even very limited experimental data is available to guide the<br />

initial low resolution search. For example, CS-ROSETTA uses <strong>NMR</strong> chemical shifts to guide


104 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

fragment selection and constrain backbone torsion angles, greatly improving the final yield of<br />

correctly folded protein models (Shen et al., 2008). As ROSETTA in favorable cases is capable of<br />

generating protein structures very close to experimentally determined structures from sequence<br />

information alone (Bradley et al., 2005), it is of great interest to combine ROSETTA with readily<br />

accessible experimental data to determine protein structures.<br />

In this paper we describe the incorporation of PCS data into ROSETTA. We show that this<br />

new PCS-ROSETTA method can generate accurate structures for proteins of up to 150 amino acids<br />

in length even from quite limited data sets.<br />

4.3 Results<br />

4.3.1 Test set<br />

We tested the new PCS-ROSETTA method (see Methods) on a benchmark of nine proteins<br />

for which chemical shifts and PCSs have been published. ArgN repressor was determined twice<br />

with PCS data measured from paramagnetic metal ions at two different sites. The proteins were<br />

between 56 to 186 amino acid residues in size, had different folds and had between 82 and 1169<br />

PCSs measured from one to eleven different metal ions located at a single metal binding site [Table<br />

4.1 and supporting information (SI) Table S4.1]. Fragments for each protein were selected with CS-<br />

ROSETTA using available chemical shift data and were used for all calculations. Structures of<br />

proteins with significant sequence similarity to the target proteins were explicitly excluded from the<br />

CS-ROSETTA database.<br />

Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA<br />

Targets PDB ID Nres a NM b c PCS-ROSETTA<br />

Npcs<br />

run d<br />

CS-ROSETTA<br />

run e<br />

rmsd f conv g Q h rmsd f conv g<br />

protein G (A) 3GB1 56 3 158 0.61 0.92 0.06 0.80 0.88 (Wilton et al., 2008)<br />

Refcs i Refpcs j<br />

calbindin (B) 1KQV 75 11 1169 1.46 2.04 0.16 4.96 4.37 (Balayssac et al., 2006)<br />

θ subunit (C) 2AE9 76 2 91 1.65 4.35 0.07 8.90 8.75 (Mueller et al., 2005)<br />

(Saio et al.,<br />

2009)<br />

(Bertini et al.,<br />

2001)<br />

(Schmitz et al.,<br />

2008)<br />

ArgN k (D) 1AOY 78 3 222 0.98 2.38 0.08 6.93 5.32 (Su et al., 2008) (Su et al., 2008)<br />

ArgN l (E) 1AOY 78 2 82 1.03 2.25 0.09 8.01 6.64 (Su et al., 2008) (Su et al., 2009a)<br />

N-calmodulin<br />

(F)<br />

1SW8 79 2 125 2.34 1.85 0.09 4.69 3.68 (Bertini et al., 2004)<br />

(Bertini et al.,<br />

2004)


thioredoxin (G) 1XOA 108 1 90 2.58 2.64 0.23<br />

parvalbumin<br />

(H)<br />

4.98 6.06<br />

(Lemaster et al., 1988,<br />

Chandrasekhar et<br />

al., 1991)<br />

1RJV 110 1 106 11.26 10.42 0.20 11.80 11.20 (Baig et al., 2004)<br />

calmodulin (I) 2K61 146 4 408 2.80 2.12 0.14 6.35 5.55 (Bertini et al., 2009)<br />

ε186 m (J) 1J54 186 3 738 20.57 17.54 0.36 15.46 17.23 (DeRose et al., 2002)<br />

a Number of residues.<br />

b Number of metal ions for which PCS data were measured.<br />

c Total number of PCSs measured.<br />

4.3 Results. 105<br />

(Jensen et al.,<br />

2006)<br />

(Baig et al.,<br />

2004)<br />

(Bertini et al.,<br />

2009)<br />

(Schmitz et al.,<br />

2006)<br />

d The structures used to calculate the rmsds were identified using the combined PCS-score and<br />

ROSETTA full atom energy on the whole protein sequence.<br />

e The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy on<br />

the whole protein sequence.<br />

f C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />

values were calculated using the core residues defined in SI Table S4.2.<br />

g Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />

structure, in Å. The rmsd values were calculated on the whole protein sequence.<br />

h Quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ) calculated on the structure of lowest<br />

PCS-ROSETTA score.<br />

i Reference for the experimental chemical shifts.<br />

j Reference for the experimental PCSs.<br />

k PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />

repressor (ArgN).<br />

l PCSs measured with a non-covalent tag bound to ArgN.<br />

m N-terminal 186 residues of the subunit of the E. coli polymerase III.<br />

4.3.2 Capacity of the PCS Score to Identify Native-like Structures<br />

The PCS score describes a model’s agreement with observed PCS data by calculating the<br />

expected PCS data given the structure. To calculate this, a three dimensional grid search is used to<br />

determine the metal coordinates and Δχ-tensor components necessary for producing an optimal<br />

match between calculated and observed data (see Materials and Methods). The capacity of the PCS<br />

score to identify native like models was tested on sets of 3000 CS-ROSETTA structures for each of<br />

the nine test proteins. These test structures were produced using a reduced fragment set and


106 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

included native fragments to ensure that some of the models were similar to the target structure.<br />

The C rmsd of the decoy with the lowest PCS score was always small (below 2.3 Å) with respect<br />

to the target protein (Figure 4.1). In addition, for all target proteins for which PCSs were available<br />

from two or more paramagnetic metal ion, low C rmsd values correlated with low PCS scores.<br />

This indicates that the PCS score can be used not only to identify near-native structures, but also to<br />

bias conformational sampling towards the native structure during the fragment assembly.<br />

Comparisons between the ROSETTA low resolution energy function and PCS score are shown in<br />

SI Figure S4.1.<br />

Figure 4.1 Fold identification by pseudocontact shifts. 3000 decoys were generated using CS-<br />

ROSETTA. In order to ensure the presence of decoys with low rmsd values to the target structure,<br />

the starting set of peptide fragments was reduced and included fragments from the known target<br />

structures. PCS scores are plotted versus the C rmsd to the target structure. The targets are<br />

labeled A-J as in Table 4.1. The PCS score correlates strongly with the C rmsd.<br />

PCSs from eleven different lanthanides were available for calbindin. In order to explore the<br />

value of using PCSs from multiple lanthanides, we rescored the structures using PCSs from both<br />

individual and multiple lanthanides. Linear regressions of PCS score versus rmsd had slopes<br />

ranging from 0.03 to 5.17 (average 2.26) for single data sets. Pairwise combination of PCS sets<br />

resulted in increased regression slopes ranging from 0.15 to 7.42 (average 3.84). Using all PCS sets<br />

resulted in a slope greater than 11, showing that PCSs from multiple metal ions greatly facilitate<br />

identification of native-like protein folds.<br />

4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA


4.3 Results. 107<br />

10000 decoys each were generated with CS-ROSETTA and PCS-ROSETTA. Both<br />

computations used the same fragment set, taking into account secondary structure information from<br />

chemical shift measurements. Figure 4.2 illustrates the ability of the PCS score to bias sampling<br />

towards the native structure. For seven out of the ten structure calculations, the PCSs dramatically<br />

increased the frequency with which decoys with low C rmsd to the reference structure were found.<br />

The effect was particularly pronounced for protein targets with larger PCS data sets. For example,<br />

more than a third of the decoys found for calmodulin had a C rmsd of less than 4 Å to the target<br />

structure, whereas fewer than 3% met this criterion in the absence of PCS data. Similar results were<br />

obtained for the θ subunit, protein G, and both ArgN repressor calculations. The PCS data did not<br />

significantly improve the results for thioredoxin and parvalbumin for which only PCS data from a<br />

single paramagnetic metal ion were available. No native-like structures were found for 186 which<br />

may be attributed to its larger size (186 residues). To evaluate the influence of the PCS score during<br />

the fragment assembly, we performed an additional calculation with the PCS score as the only<br />

energy term (SI Text S4.1).<br />

The low resolution models were subjected to full atom relaxation refinement in the last step<br />

of the calculation, using the full atom ROSETTA force field (without inclusion of the PCS score).<br />

The additional minimization step did not significantly change the overall shape of the distributions,<br />

but tended to improve the C rmsd of native-like decoys (SI Figure S4.2) and, most importantly,<br />

allows recognition of the best models based on their energies.<br />

Rescoring full atom relaxed structures with a weighted combination of the ROSETTA and<br />

PCS scores further improved the recognition of near-native structures as measured by the C rmsd<br />

of the lowest energy structure [Table 4.1-f, PCS-ROSETTA run; Figure 4.3], with PCS-ROSETTA<br />

identifying low C rmsd (< 3 Å) structures in eight out of ten cases. With the exception of target C,<br />

for all successful targets a population of the five lowest energy structures converge to less than 3 Å<br />

, while the two failed targets do not improve beyond 10 Å [Table 4.1-g]. Convergence is a signal<br />

that the protocol has found a topology that reliably satisfies the combined score, which in the case<br />

of PCS-ROSETTA clearly identifies the failed models as unreliable, allowing for their rejection<br />

(Shen et al., 2008). In the case of target C large disordered termini prevent a clear identification of<br />

convergence, but convergence becomes apparent when only the core residues are considered (Table<br />

S4.2-g). Results with CS-ROSETTA and PCS-ROSETTA are compared in SI Figure S4.3.


108 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure 4.2 Improved conformational sampling by PCS-ROSETTA. 10000 independent low<br />

resolution trajectories were carried out with (black) or without (red) PCS information. The plots<br />

show the density of C rmsd values to the target structure after the fragment assembly step. The<br />

targets are labeled as in Table 4.1. Corresponding plots of structures calculated with full atom<br />

relaxation for positioning the amino acid side chains are shown in SI Figure S4.2. The library used<br />

for fragment selection explicitly excluded any protein with sequence similarity to the target protein.<br />

The figure shows that PCS scores efficiently guide fragment assembly towards the correct target<br />

structure.<br />

Figure 4.3 Energy landscapes generated by PCS-ROSETTA. Combined ROSETTA energy and PCS<br />

score (using the weighting factor w(c)) are plotted versus the C rmsd to the target structure for<br />

structures calculated using PCS-ROSETTA. The lowest energy structures are indicated in red. The<br />

targets are labeled as in Table 4.1. The results show that PCS-ROSETTA is likely to generate and<br />

identify the correct fold.


4.3 Results. 109<br />

Agreement of the structures with the experimental data can also be directly assessed by the<br />

quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ), where PCSi exp is the experimental PCS<br />

value for the nuclear spin i 7 . A quality factor above 25% indicates failure to find a correct structure<br />

and a quality factor below 20% indicates that the computed structure is in good agreement with the<br />

experimental PCSs (Table 4.1), as in other definitions of quality factors (Cornilescu et al., 1998).<br />

The low quality factor of the θ subunit (7%) establishes the success of the calculation despite the<br />

lack of clear convergence.<br />

4.3.4 Successes and Limits of PCS-ROSETTA Calculations<br />

The results of PCS-ROSETTA calculations are summarized in Table 4.1. The structures of<br />

small proteins (< 80 residues, targets A to F) are easily solved by PCS-ROSETTA: the lowest PCS-<br />

ROSETTA energy are consistently below 2.4 Å in C rmsd relative to the native structure and have<br />

quality factor below 16%. For these proteins, the generation of 10000 models was ample (Figure<br />

4.2 A to F). The same number of decoys calculated with CS-ROSETTA did not lead to satisfactory<br />

convergence for targets B-C-D-E-F (Table 4.1-g), though targets C and D partially recover if<br />

flexible termini are removed at the full atom rescoring step (SI Text S4.2). The tag used to<br />

paramagnetically label ArgN (D) produced Δχ-tensor axes of significantly different orientation with<br />

different lanthanides (Su et al., 2008) which may explain why the PCS-ROSETTA calculations<br />

performed particular well with these data.<br />

PCS-ROSETTA succeeded in calculating the structure of a protein with 146 residues and<br />

PCSs from multiple lanthanides (target I). More than 62% of calculated structures had a C RMSD<br />

below 5 Å, while only 6.2% met that criterion for CS-ROSETTA calculation (Figure 4.2 I). This<br />

indicates that the PCS data score will effectively guide the sampling towards the correct fold also<br />

for larger proteins. While calculations on target J (186 residues) did not converge despite a large<br />

PCS data set, this can be attributed to a sampling problem associated with large proteins of<br />

complex topology (Bradley et al., 2005) which may be overcome with a modified protocol.<br />

Importantly, the success of a calculation can be ascertained from calculating the quality factor Q.<br />

Combined with the convergence criterion (Shen et al., 2008), the quality factor is an effective way<br />

7 Rms stands for Root Mean Square. Not to be confused with Rmsd (Root Mean Square Deviation).


110 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

to assert the success of a calculation (SI Figure S4.4). For each of the eight targets for which the<br />

PCS-ROSETTA calculations converged, the structure with the lowest energy is shown<br />

superimposed with the native structure in Figure 4.4.<br />

Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />

structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures (red).<br />

The protein targets are (A) protein G, (B) calbindin, (C) the θ subunit of E. coli DNA polymerase<br />

III, (D) the N-terminal domain of the E. coli arginine repressor (ArgN; with covalent lanthanide<br />

tag), (E) ArgN with non-covalent lanthanide tag, (F) the N-terminal domain of calmodulin, (G)<br />

thioredoxin, (H) parvalbumin, (I) calmodulin and (J) the globular domain of the ε subunit of E. coli<br />

DNA polymerase III. Flexible termini were omitted as described in SI Table S4.1. Only the target<br />

structure is shown for parvalbumin (H) and the ε subunit (J), as the calculations could not<br />

reproduce the correct fold for these proteins.<br />

4.4 Discussion<br />

The structural information content of the PCS effect has long been recognized, but initial<br />

attempts to determine the 3D structures of biomolecules by the use of PCSs were hampered by the<br />

difficulty to determine -tensor and structure simultaneously (Barry et al., 1971). Subsequently,<br />

the first 3D structure determinations of proteins relied on nuclear Overhauser effect data (Wüthrich,<br />

1986). Full structure determination of proteins from PCS data alone continues to be regarded as<br />

difficult (Bertini et al., 2002a). Owing to its modeling capabilities, PCS-ROSETTA makes it


4.4 Discussion. 111<br />

possible, for the first time, to determine 3D structures using PCSs as the only restraints while<br />

simultaneously determining all Δχ-tensor parameters and integrating PCSs from different metal<br />

ions. In addition, a PCS quality factor can be calculated that is highly indicative of the correctness<br />

of the final structure. The effect of the PCSs on improving convergence of the calculations towards<br />

the correct target structures is particularly remarkable if one considers that PCS data mostly were<br />

available only for backbone amides.<br />

The success of PCS-ROSETTA is based on the fact that, in contrast to scoring functions<br />

using chemical shift data, the PCS score is much more sensitive to global than local structure.<br />

Therefore, PCS data can guide the search in the low resolution fragment assembly step, greatly<br />

increasing the yield of near-native structures compared to CS-ROSETTA. PCSs thus present an<br />

ideal complement to chemical shift information that is most important in the preceding fragment<br />

selection step. The improved convergence alleviates the need to compute large numbers of decoys.<br />

It would be possible to accelerate the computations further by using the PCS score to select decoys<br />

with low rmsd values to the target structure prior to the computationally expensive refinement of<br />

amino acid side chain conformations.<br />

Many protein specific factors, including fold complexity, number and quality of PCS data,<br />

and metal site play roles in the success of PCS-ROSETTA fragment assembly and their relative<br />

importance is difficult to disentangle. In general, PCS data from two or more lanthanides are<br />

expected to assist identification of decoys with low rmsd to the target structure. While the structure<br />

of calmodulin, a protein with 146 residues, was successfully determined by PCS-ROSETTA, the<br />

structure of ε186 (186 residues) was not found by the program despite the availability of many<br />

PCSs overall (Table 4.1). The scarcity of PCS values for residues near the lanthanide binding site<br />

may have contributed to this effect. As the PCS-ROSETTA protocol did not sample structures<br />

below 10 Å rmsd (Figure 4.3 J) and the PCS scores of native-ε186 like structures only show a<br />

funnel-like energy landscape below 10 Å rmsd (Figure 4.1 J), this could also be a case where<br />

structures explored by the basic ROSETTA sampling protocol do not form enough native features<br />

for the PCS score to discriminate between them. An alternative sampling protocol, such as broken<br />

chain sampling (Bradley et al., 2006) or iterative refinement (Qian et al., 2007), may be the key to<br />

accurately modeling the structure of ε186 using PCS data.<br />

The present calculations were performed with proteins containing single metal binding sites.<br />

Clearly, data from multiple metal ions using different metal binding sites will greatly enhance the<br />

information content of PCS data. In particular, lanthanide ions display very different paramagnetic<br />

properties while their chemical similarity allows all lanthanides to bind at a given lanthanide


112 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

binding site. Several metal binding tags have recently been developed to tag proteins site-<br />

specifically with a paramagnetic lanthanide; for a recent review, see (Su et al., 2009b). We note that<br />

PCSs were as useful for targets devoid of natural metal binding sites (targets A, C, D and E) as for<br />

metalloproteins (Figure 4.2).<br />

We propose a new approach to protein structure determination in which PCS data are<br />

collected from natural or engineered metal binding sites, and then used to guide ROSETTA<br />

conformational search along with backbone chemical shift data. The accuracy and reliability of the<br />

lowest energy models is assessed based on the convergence of the calculation and the PCS quality<br />

factor. With multiple independent lanthanide datasets and improved conformational search<br />

methods, the approach should be extendable to proteins greater than 150 amino acids.<br />

4.5 Materials and Methods<br />

4.5.1 PCS-ROSETTA Score.<br />

et al., 2002b)<br />

The PCS (in ppm) induced by a metal ion M on a nuclear spin can be calculated as (Bertini<br />

(4.2)<br />

where ri is the distance between the spin i and the paramagnetic centre M, xi, yi, zi are the<br />

Cartesian coordinates of the vector between the metal ion and the spin i in an arbitrary frame f and<br />

Δχxx, Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in the frame f (as Δχzz = -Δχxx -Δχyy,<br />

there are only five independent parameters). The Δχ-tensor components and the metal coordinates<br />

are initially unknown and must be redetermined each time the PCS score c is evaluated. c is<br />

calculated over all metal ions Mj as<br />

(4.3)<br />

where PCSi calc (Mj) and PCSi exp (Mj) are the calculated and experimental PCS values of spin<br />

i induced by the metal ion Mj, respectively. The determination of the Δχ-tensor components and the


4.5 Materials and Methods. 113<br />

metal coordinates presents a non-linear least square fitting problem. In order to avoid local minima<br />

and speed up the calculation, we split the problem into its linear and non-linear part. Equation (4.2)<br />

shows that PCSi calc is linear with respect to the five Δχ-tensor components. Using a three-<br />

dimensional grid search over the Cartesian coordinates xM, yM, zM of the paramagnetic centre,<br />

singular value decomposition optimizes the five Δχ-tensor parameters efficiently and without<br />

ambiguity for lowest residual score c at each node of the grid. The grid node with the lowest c score<br />

is then used as the starting point for optimization of the three metal coordinates along with the five<br />

Δχ-tensor components to reach the minimal cost c.<br />

The PCS score was added to the ROSETTA low resolution energy function using a different<br />

weighting factor w(c) for each structure calculation. w(c) was determined by first generating 1000<br />

decoys with ROSETTA and calculating w(c) as<br />

(4.4)<br />

where ahigh and alow are the average of the highest and lowest 10% of the values of the<br />

ROSETTA ab initio score, and chigh and clow are the average of the highest and lowest 10% of the<br />

values of the PCS score c upon rescoring each of the 1000 decoys with the PCS. The weights used<br />

for the ten structure calculations performed in the present work are given in SI Table S4.1.<br />

4.5.2 PCS-ROSETTA Algorithm<br />

PCS-ROSETTA uses the ROSETTA de novo structure prediction methodology to build low<br />

resolution models, followed by all atom refinement using the ROSETTA high resolution Monte<br />

Carlo minimization protocol. The additions to the standard ROSETTA structure prediction methods<br />

are: the use of chemical shifts to guide fragment selection as in CS-ROSETTA, the use of PCS data<br />

to guide the initial low resolution search and the use of PCS data for final model selection. A flow<br />

diagram of the computational protocol of PCS-ROSETTA is shown in SI Figure S4.5.<br />

4.5.3 Input for PCS-ROSETTA<br />

The chemical shifts of all protein targets were taken from the literature or from the<br />

BioMagResBank. CS-ROSETTA was used for fragment selection. CS-ROSETTA reports the<br />

difference between experimental and expected chemical shifts. Chemical shifts with very large<br />

deviations from expectations (often attributable to errors in the deposited data) were removed from<br />

the input. CS-ROSETTA also suggests corrections in the chemical shift referencing. We only


114 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

corrected 13 C chemical shifts, except for thioredoxin where 15 N chemical shift were corrected (SI<br />

Table S4.1). CS-ROSETTA aims to generate 200 9-residue fragments and 200 3-residue fragments<br />

centered on each residue of the polypeptide chain for use in the ab initio fragment assembly<br />

protocol of ROSETTA. In cases where CS-ROSETTA failed to generate 200 fragments, we<br />

generated additional fragments using the conventional ROSETTA protocol in order to make 200<br />

fragments available. For each of the target proteins, we removed any protein with recognizable<br />

sequence similarity (BLAST E-value below 0.05) from the CS-ROSETTA protein database. (For<br />

example, the structure of ε186 is present in the CS-ROSETTA database, but was explicitly<br />

excluded when fragments were generated.) In order to accelerate the grid search for the metal<br />

position, PCS-ROSETTA allows a precise description of the space to be searched, including the<br />

center of the grid search (cg), the step size between two nodes (sg), an outer cutoff radius (co) to<br />

limit the search to a minimal distance from cg, and an inner cutoff radius (ci) to avoid a search too<br />

close to cg. A moderately large step size (sg) was chosen to speed up computations during low<br />

resolution sampling (Table S4.1), and reduced to 25% of its value during the final high resolution<br />

scoring step to ensure maximum accuracy. For each target, the grid parameters cg, co and ci were<br />

chosen in accordance to prior knowledge about the approximate metal binding site. For example,<br />

for a covalent tag attached to the protein, we used the known geometric information of the tag to set<br />

cg, co, and ci, whereas for proteins with a natural metal binding site, a highly conserved negatively<br />

charged residue was picked as a reference point for cg. In the absence of prior biochemical<br />

information, the nuclear spin with the largest absolute PCS value was chosen as the center of the<br />

grid. SI Table S4.1 summarizes the grid parameters used for the different protein targets. In order to<br />

assess the impact of the initial grid parameters on the structures calculated, a set of PCS-ROSETTA<br />

calculations was performed for each target, where cg was centered at the nuclear spin of the largest<br />

PCS observed and the cutoff radius co was set to 15 Å. No change in the quality of the results was<br />

observed but in most cases the calculations took longer.<br />

4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination<br />

Chemical shifts of the proteins were prepared in Talos format (Cornilescu et al., 1999) and<br />

used by CS-ROSETTA for fragment selection. Chemical shift corrections, fragment selection, and<br />

determination of the weights w(c) were performed as described above. 10000 protein structures<br />

were computed with PCS-ROSETTA and subjected to the full atom relaxation protocol of<br />

ROSETTA to model the side chain conformations. The final structures were rescored using the<br />

ROSETTA full atom energy function combined with the PCS scores c, using the weighting factors<br />

w(c) (Equation (4.4)) with ahigh and alow calculated against the ROSETTA full atom energy, and


4.6 Acknowledgments. 115<br />

with a total weight multiplied by 2 to give a larger contribution to the PCS score than in the<br />

fragment assembly. The best scoring structures can be assessed by the PCS quality factor Q =<br />

rms(PCS cal – PCS exp ) / rms(PCS exp ). Computation of 10000 PCS-ROSETTA structures took on<br />

average 137 CPU days per target and was run on a local cluster. SI Figure S4.6 shows a posteriori<br />

that 1000 structures per targets would have been enough for convergence of the protocol.<br />

4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring<br />

3000 decoys with a wide range of rmsd values to the target structure were generated by<br />

including the native fragment and limiting the number of alternatives fragments in the fragment<br />

generation step of the ROSETTA calculations. 1000 decoys each were calculated using two, five<br />

and ten fragments per residue, respectively. The presence of the native fragments in a small pool of<br />

fragments ensured the generation of structures very similar to the target structure.<br />

4.6 Acknowledgments<br />

C.S. thanks the University of Queensland for a Graduate School <strong>Research</strong> Travel Grant to<br />

undertake this collaborative research project. T.H. thanks the Australian <strong>Research</strong> Council for a<br />

Future Fellowship. Financial support from the Australian <strong>Research</strong> Council for project grants to<br />

G.O. and T.H. is gratefully acknowledged. D.B. thanks the Howard Hughes Medical Institutes.<br />

4.7 References<br />

Arnesano F, Banci L and Piccioli M (2005) <strong>NMR</strong> structures of paramagnetic metalloproteins. Q.<br />

Rev. Biophys. 38:167-219<br />

Baig I, Bertini I, Del Bianco C, Gupta YK, Lee YM, Luchinat C and Quattrone A (2004)<br />

Paramagnetism-based refinement strategy for the solution structure of human α-<br />

parvalbumin. Biochemistry 43:5562-5573<br />

Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />

complete aminoacid identification in thulium substituted Calbindin D9K. J. Biomol. <strong>NMR</strong><br />

34:63-73


116 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />

restraints for Xplor-NIH. J. Biomol. <strong>NMR</strong> 28:249-261<br />

Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />

PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />

torsion angle molecular dynamics. J. Biomol. <strong>NMR</strong> 12:553-557<br />

Barry CD, North ACT, Glasel JA, Williams RJP and Xavier AV (1971) Quantitative determination<br />

of mononucleotide conformations in solution using lanthanide ion shift and broadening<br />

<strong>NMR</strong> probes. Nature 232:236-245<br />

Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />

Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />

domain reorientation in calmodulin. Proc. Natl. Acad. Sci. USA. 101:6841-6846<br />

Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />

Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />

Ca Ln calbindin D9k. J. Biomol. <strong>NMR</strong> 21:85-98<br />

Bertini I, Kursula P, Luchinat C, Parigi G, Vahokoski J, Wilmanns M and Yuan J (2009) Accurate<br />

solution structures of proteins from X-ray data and a minimal set of <strong>NMR</strong> data:<br />

Calmodulin-peptide complexes as examples. J. Am. Chem. Soc. 131:5134-5144<br />

Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002a) Efficiency of paramagnetism-<br />

based constraints to determine the spatial arrangement of α-helical secondary structure<br />

elements. J. Biomol. <strong>NMR</strong> 22:123-136<br />

Bertini I, Luchinat C and Parigi G (2002b) Magnetic susceptibility in paramagnetic <strong>NMR</strong>. Prog.<br />

<strong>NMR</strong> Spectr. 40:249-273<br />

Bradley P and Baker D (2006) Improved beta-protein structure prediction by multilevel<br />

optimization of NonLocal strand pairings and local backbone conformation. Proteins<br />

65:922-929<br />

Bradley P, Misura KMS and Baker D (2005) Toward high-resolution de novo structure prediction<br />

for small proteins. Science 309:1868-1871<br />

Chandrasekhar K, Krause G, Holmgren A and Dyson HJ (1991) Assignment of the 15 N <strong>NMR</strong><br />

spectra of reduced and oxidized Escherichia Coli thioredoxin. FEBS Lett. 284:178-183<br />

Cornilescu G, Delaglio F and Bax A (1999) Protein backbone angle restraints from searching a<br />

database for chemical shift and sequence homology. J. Biomol. <strong>NMR</strong> 13:289-302<br />

Cornilescu G, Marquardt JL, Ottiger M and Bax A (1998) Validation of protein structure from<br />

anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc.<br />

120:6836-6837


4.7 References. 117<br />

DeRose EF, Li DW, Darden T, Harvey S, Perrino FW, Schaaper RM and London RE (2002) Model<br />

for the catalytic domain of the proofreading epsilon subunit of Escherichia coli DNA<br />

polymerase III based on <strong>NMR</strong> structural data. Biochemistry 41:94-110<br />

Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />

of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J.<br />

Biomol. <strong>NMR</strong> 28:205-212<br />

Jensen MR and Led JJ (2006) Metal-protein interactions: Structure information from Ni 2+ -induced<br />

pseudocontact shifts in a native nonmetalloprotein. Biochemistry 45:8782-8787<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J. Am. Chem. Soc. 128:12910-12916<br />

Lemaster DM and Richards FM (1988) <strong>NMR</strong> sequential assignment of Escherichia Coli<br />

thioredoxin utilizing random fractional deuteriation. Biochemistry 27:142-150<br />

Mueller GA, Kirby TW, DeRose EF, Li D, Schaaper RM and London RE (2005) Nuclear magnetic<br />

resonance solution structure of the Escherichia coli DNA polymerase III θ subunit. J.<br />

Bacteriol. 187:7081-7089<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc. Chem. Res. 40:206-212<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J. Am. Chem.<br />

Soc. 128:3696-3702<br />

Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ and Baker D (2007) High-resolution<br />

structure prediction and the crystallographic phase problem. Nature 450:259-264<br />

Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange<br />

O, Kinch L, Sheffler W, Kim BH, Das R, Grishin NV and Baker D (2009) Structure<br />

prediction for CASP8 with all-atom refinement using Rosetta. Proteins online:<br />

Saio T, Ogura K, Yokochi M, Kobashigawa Y and Inagaki F (2009) Two-point anchoring of a<br />

lanthanide-binding peptide to a target protein enhances the paramagnetic anisotropic effect.<br />

J. Biomol. <strong>NMR</strong> 44:157-166<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J. Biomol. <strong>NMR</strong> 35:79-<br />

87<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J.<br />

Biomol. <strong>NMR</strong> 41:179-189


118 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />

Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />

A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc.<br />

Natl. Acad. Sci. USA. 105:4685-4690<br />

Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />

from fragments with similar local sequences using simulated annealing and bayesian<br />

scoring functions. J. Mol. Biol. 268:209-225<br />

Su XC, Liang H, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- Is a convenient paramagnetic shift<br />

reagent for protein <strong>NMR</strong> studies. J. Am. Chem. Soc. 131:10352-10353<br />

Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />

(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />

<strong>NMR</strong> spectroscopy. J. Am. Chem. Soc. 130:10486-10487<br />

Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J. Biomol.<br />

<strong>NMR</strong> in press:<br />

Ubbink M, Ejdebäck M, Karlsson BG and Bendall DS (1998) The structure of the complex of<br />

plastocyanin and cytochrome f, determined by paramagnetic <strong>NMR</strong> and restrained rigid-<br />

body molecular dynamics. Structure 6:323-335<br />

Wilton DJ, Tunnicliffe RB, Kamatari YO, Akasaka K and Williamson MP (2008) Pressure-induced<br />

changes in the solution structure of the GB1 domain of protein G. Proteins 71:1432-1440<br />

Wüthrich K (1986). <strong>NMR</strong> of proteins and nucleic acids. Wiley, New York.<br />

Zhuang T, Lee HS, Imperiali B and Prestegard JH (2008) Structure determination of a Galectin-3-<br />

carbohydrate complex using paramagnetism-based <strong>NMR</strong> constraints. Protein Sci. 17:1220-<br />

1231<br />

4.8 Supporting information


4.8 Supporting information. 119<br />

Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy. 3000 decoys<br />

were generated using CS-ROSETTA. In order to ensure that some decoys with small rmsd to the<br />

target structure were obtained, the starting set of peptide fragments was reduced and included the<br />

fragments from the known target structures. A to J: ROSETTA energies plotted versus the C rmsd<br />

to the target structure. A’ to J’: PCS scores plotted versus the C rmsd to the target structure. The<br />

targets are labeled A-J as in Table 4.1.


120 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.2 Improved fragment assembly by PCS-ROSETTA. Fragments were assembled in 10000<br />

different runs of CS-ROSETTA (red), 10000 different runs of PCS-ROSETTA (black), and 10000


4.8 Supporting information. 121<br />

different runs using exclusively the PCS score of PCS-ROSETTA (blue). The plots show the<br />

frequency with which structures of different C rmsd values to the target structure were found. The<br />

red and black solid lines reproduce the data of Figure 4.2. The dashed lines show the<br />

corresponding data obtained in independent calculations that included the full atom refinement<br />

step. The same colors were used for calculations with and without the full atom refinement step.<br />

The full atom refinement step does not significantly change the C rmsd of the structures produced<br />

in the fragment assembly step with respect to the target structure. The targets are labeled A-J as in<br />

Table 4.1.<br />

Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />

ROSETTA energies and C α rmsd values being calculated using only the core residues as defined in<br />

Table S4.1. A to J: full atom ROSETTA energies plotted versus the C α rmsd to the target structure<br />

for structures calculated using CS-ROSETTA. A’ to J’: Combined ROSETTA energy and PCS score<br />

plotted versus the C rmsd to the target structure for structures calculated using PCS-ROSETTA.<br />

The lowest energy structures are indicated in red. The targets are labeled A-J as in Table 4.1.


122 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.4 Identification of successful calculations with PCS-ROSETTA. The quality factor Q<br />

reports on the agreement between the experimental and calculated PCS. A value below 20%<br />

usually indicates that the calculated structure satisfy the PCS restraint. Above 25%, the quality of<br />

the structure is poor. On the y axis are the average C α rmsd calculated between the lowest scored<br />

structure and the next four lowest scoring structures. Rmsd below 3 Å are indicative of the<br />

convergence of the protocol. Convergence criterion and quality factor can be combined to further<br />

ascertain the success of the calculations for the targets A-B-C-D-E-F-I-G., and reject targets H and<br />

J. The targets are labeled A-J as in Table 4.1. The values are those of Table 4.1-h for the x axis,<br />

and Table 4.1-g for the y axis.


4.8 Supporting information. 123<br />

Figure S4.5 Flow diagram of PCS-ROSETTA. (a) Fragments are selected by their chemical shifts<br />

using CS-ROSETTA. (b) The PCS weight is calculated using equation (4.4) on 1000 decoys<br />

generated with CS-ROSETTA. (c) Structures are produced by the classical fragment assembly of<br />

ROSETTA with addition of the PCS-score. (d) Side chains are added to the structures and<br />

subjected to a full atom minimization. (e) Resulting structures are rescored using a combination of<br />

the ROSETTA full atom energy score and the PCS score. (f) Best structures are selected by their<br />

lowest score.


124 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA. A<br />

given number n of structures (x axis) was randomly chosen 5000 times from the total of 10 000<br />

generated structures and the averaged C α rmsd of the lowest energy (over the 5000 trials) is<br />

graphed. The curves show a posteriori that 1000 structures calculated for all the targets would<br />

have been ample to ensure convergence of PCS-ROSETTA calculations. The targets are labeled A-<br />

J as in Table 4.1. The curve for the target parvalbumin (H) and ε (J) are not shown.


Table S4.1 PCS data information and grid search parameters used.<br />

4.8 Supporting information. 125<br />

Protein name Residues a Metal ions used Atom types cs corr b w(c) cg c sg d co d ci d<br />

protein G (A) 1-56 Tb 3+ Tm 3+ Er 3+ H N Ce<br />

0.53 15.5 E19 CA 6 17 7<br />

calbindin (B) 2-75<br />

3+ Dy 3+ Er 3+<br />

Eu 3+ Ho 3+ Nd 3+<br />

Pr 3+ Sm 3+ Tb 3+<br />

Tm 3+ Yb 3+<br />

H N , N, C’ 2.72 48.9 D54 CA 6 8 4<br />

θ subunit (C) 10-64 Dy 3+ Er 3+ H N -0.16 7.1 D14 CA 6 25 15<br />

ArgN (D) 8-70 Tb 3+ Tm 3+ Yb 3+ H N , N 2.09 13.5 C68 CB 6 10 4<br />

ArgN (E) 8-70 Tb 3+ Tm 3+ H N 2.09 48.9 K12 CB 6 15 0<br />

N-calmodulin (F) 3-79 Tb 3+ Tm 3+ H N , CA, CB 0.00 4.7 D60 CA 6 8 4<br />

thioredoxin (G) 2-108 Ni 2+ H N 1.23 106.3 S1 N 3.8 4 0<br />

parvalbumin (H) 2-109 Dy 3+ H N , N 2.65 2.86 D93 CA 6 8 4<br />

calmodulin (I) 3-146<br />

Tb 3+ Tm 3+ Yb 3+<br />

Dy 3+ H N 0.59 5.1 D60 CA 6 8 4<br />

ε186 (J) 7-180 Tb 3+ Dy 3+ Er 3+ H N , N, C’ 0.53 8.2 D12 CA 6 8 4<br />

a Ordered residues<br />

b Uniform offset used for 13 C chemical shifts (in ppm) compared to published values. In the case of<br />

thioredoxin, the offset was applied to 15 N chemical shifts<br />

c Residue and atom name defining the center of the grid search to position the paramagnetic center.<br />

d In Ångstrom


126 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA.<br />

Targets PCS-ROSETTA run a CS-ROSETTA run b<br />

rmsd c convergence d rmsd c convergence d<br />

protein G (A) 0.61 0.92 0.80 0.88<br />

calbindin (B) 1.46 2.09 4.96 4.72<br />

θ subunit (C) 1.30 0.55 1.56 2.25<br />

ArgN e (D) 1.00 0.77 1.31 2.21<br />

ArgN f (E) 0.83 0.94 1.65 5.43<br />

N-calmodulin (F) 1.74 1.49 4.69 4.49<br />

thioredoxin (G) 2.58 2.44 4.61 5.55<br />

parvalbumin (H) 11.26 10.25 11.80 11.30<br />

calmodulin (I) 2.80 2.12 6.35 2.94<br />

ε186 g (J) 20.57 18.03 17.07 17.74<br />

a The structures used to calculate the rmsds were identified using the combined PCS-score and<br />

ROSETTA full atom energy across only the core residues defined in SI Table S4.1.<br />

b The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy<br />

across only the core residues defined in SI Table S4.1.<br />

c C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />

values were calculated using the core residues defined in SI Table S4.1.<br />

d Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />

structure, in Å. The rmsd values were calculated using the core residues defined in SI Table S4.1.<br />

e PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />

repressor (ArgN).<br />

f PCSs measured with a non-covalent tag bound to ArgN.<br />

g N-terminal 186 residues of the ε subunit of the E. coli polymerase III.<br />

Text S4.1 Fragment Assembly Using PCSs Only.<br />

In order to gain a better understanding of the merit of PCS data, we generated 10000 decoys<br />

per protein with all ROSETTA force field components turned off except for the PCS score. In<br />

seven of the ten protein structure calculations, the PCS score alone produced decoys with a C rmsd<br />

of less than 2.5 Å to the target structure (Figure S4.2, solid blue line). Control calculations without


4.8 Supporting information. 127<br />

any scoring function produced not a single useful decoy. This highlights the power of PCS data to<br />

define the overall topology of a protein at the fragment assembly stage. The effect was particularly<br />

pronounced for the target proteins θ and ArgN (Figure S4.2 C and D).<br />

The second set of PCS data of ArgN (Table 4.1; structure E) yielded worse decoys in the<br />

PCS-only computations with PCS-ROSETTA than CS-ROSETTA. Remarkably, however, using<br />

the PCS score in combination with the ROSETTA force field yielded much better structures than<br />

when used separately (Figure S4.3 E). This shows that the PCS score adds information that is not<br />

captured by the ROSETTA energy score alone.<br />

Text S4.2 Scoring over Core Residues.<br />

Disordered residues can add noise to the ROSETTA energy, and this noise can prevent<br />

identification of low rmsd structures. Notably, three of the targets that succeeded under the PCS-<br />

ROSETTA protocol and failed under the CS-ROSETTA protocol have disordered termini<br />

accounting for ten or more residues each (Table S4.2-d: Targets C, D & E). In practice it is possible<br />

to experimentally determine the disordered character of a residue, so to compare the effect of<br />

disorder on the two protocols we produced an additional set of structures by removing disordered<br />

residues during the final rescoring step (cores defined in Table S4.1). When the core residues are<br />

perfectly defined, in this case by observation of the solved structures, the CS-ROSETTA protocol<br />

identifies low rmsd structures in four of the ten cases (including C, D & E), and shows convergence<br />

to a low rmsd structure in three of the ten cases [Table S4.2-c, d, CS-ROSETTA run]. In contrast,<br />

removing the disordered residues has little effect on PCS-ROSETTA’s rmsd values, suggesting that<br />

the combined PCS and ROSETTA score is less sensitive to disorder. The remaining targets had<br />

little disorder and removal of disordered terminal residues had little effect on the results.


Chapter 5<br />

Conclusion and perspectives<br />

5. Conclusion and perspectives


130 Chapter 5. Conclusion and perspectives.<br />

5.1 The use of PCS for structure determination<br />

Structure determination of proteins remains a major challenge of the post genomic era.<br />

Conventional techniques such as X-ray crystallography and <strong>NMR</strong> spectroscopy are slow. Hence,<br />

the gap between known protein sequences and known protein structures remains large. Alternative<br />

experimental methods are required to speed up the process. Those methods have to present an<br />

attractive compromise between the efforts required to measure the desired data, and the merit that<br />

data can bring in assisting de novo determination of proteins.<br />

In Chapter 4, it has been demonstrated that PCS data are a potential candidate. It has been<br />

shown that combining a molecular fragment approach with the PCS score leads to the correct<br />

folding of proteins smaller than 146 residues. A benchmark of ten data sets has been compiled for<br />

this work, and the PCS-ROSETTA approach showed success in eight out of the ten cases. The first<br />

case where PCS did not lead to the correct folding concerned a protein having only one data set of<br />

PCS. The importance to measure multiple data sets with different lanthanides has already been<br />

shown in Chapter 2 (to increase the quality of the resonance assignment) and Chapter 3 (to increase<br />

the quality of the fitted Δχ-tensor). It was not surprising that PCS-ROSETTA had difficulties when<br />

working with a single data set. The second case where the fold was incorrect was for the largest<br />

protein of the benchmark: the subunit ε, 186 residues. The size of the protein might be a limiting<br />

factor for the current approach. It is well known that for proteins larger than 150 residues, the size<br />

of the conformational space explodes in the molecular fragment replacement protocol of<br />

ROSETTA. The question whether PCS-ROSETTA is facing the same problem is legitimate. Some<br />

elements of responses to that question are presented in the following sections.<br />

5.1.1 Folding of proteins using only pseudocontact shifts<br />

In order to gain a better understanding of the merit of the PCS, structure calculations with<br />

PCS-ROSETTA have been made with all energy terms switched off, except the PCS score. The<br />

fragment assembly within ROSETTA is hence guided purely by the PCS. While turning off the<br />

normal force field of ROSETTA presents no practical interest, the theoretical results presented in<br />

the following explanations remain interesting.<br />

The protocol generated and identified (by the lowest score) attractive folding for four out of<br />

the ten structure calculations (Figure 5.1). The term ―attractive folding‖ has to be defined in that<br />

context. Obviously, without energy terms such as van der Waals terms or hydrogen bonding, the<br />

resulting structures can exhibit steric clashes (Figure 5.1 a, c, d), or poor β-sheet pairing (Figure 5.1


5.1 The use of PCS for structure determination. 131<br />

a and c). However, the C α rmsd against the native structure was found to be reasonably low (2.25 Å<br />

for protein G, 1.89 Å for θ, 2.39 Å for ArgN, 5.03 Å for calmodulin). This provides proof of<br />

principle that the PCS alone can direct the correct folding of a protein at the fragment assembly<br />

level. The conditions of success remain unclear and should be further analyzed. It could be a<br />

combination of the complexity or size of the protein, the number and quality of data sets, the<br />

relative orientation of each Δχ-tensor, and the location of the paramagnetic center.<br />

Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the<br />

protein. The lowest PCS energy structure (blue) is superimposed onto the<br />

native structure (white). (a) protein G, (b) θ subunit, (c) ArgN, (d) calmodulin.<br />

Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide<br />

fragments in the protein structure. (a) Three PCS of the spin (black) measured with three


132 Chapter 5. Conclusion and perspectives.<br />

different lanthanides can be depicted as three isosurfaces (red, blue and yellow) where the<br />

spin must be located. In order to fulfill all three PCS data simultaneously, the spin must be<br />

located at the intersection of the three isosurfaces, helping to define the orientation and<br />

position of the fragment (purple) in the protein structure (white). (b) The same principle<br />

holds, if PCS from different lanthanide binding sites are available, in which case the<br />

intersection of the three isosurfaces is even better defined.<br />

A direct consequence of those results is the theoretical demonstration that it is possible to<br />

apply the PCS score in a ―divide and conquer‖ approach in order to overcome the sampling<br />

problem typically encountered with proteins larger than 120 residues. If a large protein was<br />

composed of the four proteins present in Figure 5.1, a protocol to obtain the folding of the large<br />

protein could be (i) to cut the protein in four pieces (Figure 5.1 a-b-c-d), (ii) to reassemble the<br />

pieces separately, and (iii) to reconstitute the whole proteins by superimposition of the four<br />

separately determined (sets of) Δχ-tensors. The proof of principle holds as no energy term (that<br />

would report on interactions between the four parts of the proteins) is used. A more efficient way<br />

however, could be to work with smaller overlapping fragments. The size of the fragments would<br />

need to be large enough to make it possible to optimize the Δχ-tensor (larger than 20 residues), but<br />

not too small to prevent running into the sampling problem (smaller than 80 residues).<br />

Clearly, the PCS-score would benefit favorably from additional energy terms, starting from<br />

the van der Waals term that would strongly penalize conformations with steric clashes, thus<br />

favoring the sampling of the correct fold. In particular, the symmetric shape of the isosurfaces<br />

implies the existence of symmetrical folds that could be discriminated with the help of the van der<br />

Waals term, as theoretically demonstrated in (Bertini et al., 2002).<br />

5.1.2 Uses of multiple lanthanide binding sites<br />

Erstwhile limited to metalloproteins, paramagnetic <strong>NMR</strong> is increasingly enjoying a wider<br />

playground since the arrival of lanthanide binding tags. These tags can be attached to the protein<br />

via a disulfide bond or at the termini of the protein. A non-covalent tag can also be used, although<br />

the disadvantage is the loss of control over whether (and where) the tags bind. Attaching metal tags<br />

to a protein of interest site-specifically is one of the current challenges of paramagnetic <strong>NMR</strong>.<br />

Several groups are developing new tags, for a recent review, see (Su et al., 2009). While those tags<br />

are engineered to simplify as much as possible the process of tag attachment, a consequence of this


5.1 The use of PCS for structure determination. 133<br />

quest will hopefully be the possibility to attach them at different positions on the surface of the<br />

target protein.<br />

All work outlined in this thesis has greatly benefited from the availability of multiple data<br />

sets measured with different lanthanides. The magnitude of the paramagnetic dipole moment differs<br />

between different paramagnetic metal ions. The orientation of the Δχ-tensor varies too. The<br />

advantage is that those differences provide additional information that can be used to improve the<br />

quality of the fitted Δχ-tensor (Chapter 3) or the quality of the automated assignment (Chapter 2).<br />

PCS-ROSETTA calculations can take advantage of that fact. Especially the calculations done on<br />

calbindin greatly improved when all available lanthanide data were used compared to test<br />

calculations using PCS from only one or two lanthanides simultaneously. It can be expected that<br />

the use of tags attached at different locations on the protein will enhance the PCS-ROSETTA<br />

calculations further. In particularly, the location and the orientation of fragments with respect to the<br />

rest of the protein structure would be defined more accurately by isosurfaces intersecting at steeper<br />

angles (Figure 5.2). The current implementation of PCS-ROSETTA would make it straightforward<br />

to design such a protocol. The benefits of using two different lanthanide binding sites could be<br />

explored already using the arginine repressor as a test case, were PCS data measured for two<br />

different lanthanide binding sites are available.<br />

5.1.3 Development of a new PCS-ROSETTA protocol<br />

The way structures are calculated by PCS-ROSETTA is similar to the standard protocol of<br />

ROSETTA. The only difference is the calculation of a PCS-score during the fragment assembly<br />

stage (which requires fitting of a Δχ-tensor and metal position). The weight of the PCS-score<br />

compared relative to the standard centroid score of ROSETTA is chosen so that both have an equal<br />

influence. While ROSETTA benefits from additional experimental restraints such as PCSs, it is<br />

important to bear in mind that the original goal of ROSETTA is to generate a wide variety of<br />

protein-like structures in a first step (fragment assembly) and identify the native one in a second<br />

step (full atom score). Considering that the PCS have proven to drive the sampling towards the<br />

native structure with great efficiency, it can be questioned whether it is necessary to enforce<br />

diversity in the generated structures. Several thousands of structures (using a large amount of CPU<br />

time) are usually generated by ROSETTA to cover a wide range of possible structures. It may be<br />

more profitable to generate, at equal CPU time, a smaller number of structures for which more time<br />

is spent for the fragment assembly.


134 Chapter 5. Conclusion and perspectives.<br />

Clearly, a larger benchmark containing proteins of different topology and size is necessary<br />

to gain a better understanding of the merit of the pseudocontact shift. We are in the process of<br />

creating an artificial one, since PCS can easily be predicted. The parameters that would impact the<br />

success of calculation, and that we have to explore are: the level of noise within the data PCS-<br />

ROSETTA can tolerate, the influence of the position of the paramagnetic center relative to the<br />

protein, and the influence of the relative orientations of different Δχ-tensors.<br />

5.2 The use of PCS for chemical shift assignment<br />

The work of Chapter 2 provides proof of principle that PCS can be used for automatic<br />

chemical shift assignment when a 3D structure is available. Chemical shifts are sensitive to the<br />

local environment. Small variations in the surrounding electronic configuration can have a large<br />

impact on the chemical shift values. This makes it extremely difficult to predict chemical shifts<br />

accurately. In contrast, the PCS of a spin i is much less affected by similar variations, as the PCS<br />

only depends on the spherical coordinates of i with respect to the Δχ-tensor frame.<br />

While the program Possum presented in Chapter 2 is limited to methyl groups, the approach<br />

used to automatically assign the chemical shifts could be applied to any atom type for which PCS<br />

can be measured. The simulated annealing protocol used to solve the multi-dimensional assignment<br />

problem has proven to be highly efficient at sampling the possible assignment space and find a<br />

solution of lower energy than manual assignments. The scoring scheme used to optimize the<br />

assignment is also efficient; only few misassignments were present when multiple lanthanide data<br />

sets were available. Even more interesting may be to obtain the Δχ-tensor parameters directly from<br />

unassigned chemical shifts. The program Echidna (Schmitz et al., 2006) is capable of<br />

simultaneously getting the Δχ-tensor parameters and assigning the paramagnetic <strong>NMR</strong> spectrum,<br />

provided that the diamagnetic spectrum is already assigned. This raises the question whether both<br />

the diamagnetic and the paramagnetic spectrum can be assigned, while determining the Δχ-tensor at<br />

the same time. A software package for this purpose is currently under development. The idea is to<br />

handle various kinds of input information (partial assignment for some residues in the diamagnetic<br />

or paramagnetic state, measurement of some unassigned pseudocontact shifts, selective isotope<br />

labeling of some amino acids) and use this partial information in a simulated annealing protocol to<br />

optimize the assignment while determining the Δχ-tensor parameters. The challenge of this<br />

approach is to use the right combination of methods: simulated annealing for the assignment, grid


5.3 The use of PCS for protein docking. 135<br />

search for the coordinates of the paramagnetic center, and singular value decomposition for the<br />

determination of the remaining Δχ-tensor parameters.<br />

5.3 The use of PCS for protein docking<br />

At present, structural biology groups are investing much effort in producing models of<br />

proteins by <strong>NMR</strong> or X-ray crystallography. Currently more than 48000 crystal structures and<br />

almost 7000 <strong>NMR</strong> structures of proteins have been deposited in the protein data bank. In contrast,<br />

the number of protein-protein complexes solved by any of those methods remains low (less than<br />

2500). For X-ray crystallography, the difficulty to co-crystallize a complex is much greater than to<br />

obtain crystal structures of the individual proteins. For <strong>NMR</strong> spectroscopy, the larger molecular<br />

weight of complexes presents a problem, making it more difficult to obtain and analyze data.<br />

Additionally, the most useful information of intermolecular NOEs often involves amino acid side<br />

chains the resonances of which are much harder to assign than the backbone resonances of the<br />

protein.<br />

For a better insight of the molecular basis of life, the challenge is to understand how<br />

individual macromolecules come together to fulfill their tasks in DNA replication, gene expression<br />

and regulation, etc. Construction of models for protein-protein, protein-DNA and protein-ligand<br />

complexes are necessary. An angle to tackle the problem is the use of a docking program that<br />

predicts the binding mode of the complex given the structures of the individual components. The<br />

major difficulty of this approach is to comprehensively explore the 6-dimensional space that<br />

describes the relative orientation and position of two rigid bodies in space. This presents a<br />

challenging sampling problem. Shortage of experimental information to support any model<br />

generated is another drawback of the docking approach. Therefore, alternative techniques to<br />

provide more experimental information would be important. Measurements of residual dipolar<br />

couplings are an efficient way to obtain orientational information between two macromolecules: the<br />

macromolecules are weakly aligned using an alignment media. Independent determination of the<br />

alignment tensor with respect to the two macromolecules gives direct access to the relative<br />

orientation of the two rigid bodies. More interestingly, PCS measurements provide, in addition to<br />

orientational information, information on the distance between the two bodies. Determination of the<br />

Δχ-tensor with respect to the two molecules and superimposition of the two Δχ-tensor frames yields<br />

the relative orientation and position of the two macromolecules, as illustrated in Figure 1.8


136 Chapter 5. Conclusion and perspectives.<br />

(Chapter 1). Pseudocontact shifts therefore present particularly powerful experimental data to<br />

construct a model of the complex between two molecules by rigid body docking, and to shortcut<br />

the computationally expensive task of sampling the conformational space. PCS cannot, however,<br />

provide the high resolution accuracy that crystallography achieves and more detailed models will<br />

still require structural refinement software that optimizes the intermolecular packing and any other<br />

structural adjustments. At least, however, the computational effort of protein docking can focus on<br />

the atomic details when starting from a valid rigid body model.<br />

Another challenge facing the modeling of a protein-protein complex arises when one of the<br />

proteins undergoes a large conformational change. Protein docking software can tackle the problem<br />

to some extent by adapting the conformations of the side chains and the backbone close to the<br />

complex interface. As this approach greatly increases the conformational space to search, it is<br />

computationally challenging. Paramagnetic <strong>NMR</strong> and PCS could assist the modeling of large<br />

conformational changes by directing the conformational alterations toward the correct<br />

conformation. A protein that undergoes a large conformational change due to motions about a<br />

hinge that separates two rigid domains would present an attractive example. In a situation, where<br />

such a hinge motion occurs as a result of association with a binding partner, a technique for rigid<br />

body docking with the help of PCS as presented in this thesis could be particularly fruitfully<br />

applied to assist the modeling of the complex.<br />

5.4 References<br />

Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002) Efficiency of paramagnetism-<br />

based constraints to determine the spatial arrangement of α-helical secondary structure<br />

elements. J Biomol <strong>NMR</strong> 22:123-136<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Su XC and Otting G (2009) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />

<strong>NMR</strong> in press

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!