04.04.2013 Views

Thesis Title: Subtitle - NMR Spectroscopy Research Group

Thesis Title: Subtitle - NMR Spectroscopy Research Group

Thesis Title: Subtitle - NMR Spectroscopy Research Group

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Computational study of proteins with paramagnetic <strong>NMR</strong>: Automatic<br />

assignments of spectral resonances, determination of protein-protein and<br />

protein-ligand complexes, and structure determination of proteins<br />

Christophe Schmitz<br />

A thesis submitted for the degree of Doctor of Philosophy at<br />

The University of Queensland in December 2009<br />

School of Chemistry and Molecular Biosciences


ii<br />

Declaration by author<br />

This thesis is composed of my original work, and contains no material previously published<br />

or written by another person except where due reference has been made in the text. I have clearly<br />

stated the contribution by others to jointly-authored works that I have included in my thesis.<br />

I have clearly stated the contribution of others to my thesis as a whole, including statistical<br />

assistance, survey design, data analysis, significant technical procedures, professional editorial<br />

advice, and any other original research work used or reported in my thesis. The content of my<br />

thesis is the result of work I have carried out since the commencement of my research higher<br />

degree candidature and does not include a substantial part of work that has been submitted to<br />

qualify for the award of any other degree or diploma in any university or other tertiary institution. I<br />

have clearly stated which parts of my thesis, if any, have been submitted to qualify for another<br />

award.<br />

I acknowledge that an electronic copy of my thesis must be lodged with the University<br />

Library and, subject to the General Award Rules of The University of Queensland, immediately<br />

made available for research and study in accordance with the Copyright Act 1968.<br />

I acknowledge that copyright of all material contained in my thesis resides with the<br />

copyright holder(s) of that material.<br />

Statement of Contributions to Jointly Authored Works Contained<br />

in the <strong>Thesis</strong><br />

John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />

stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />

Soc 129:13749-13757.<br />

John designed new <strong>NMR</strong> experiments, recorded and assigned the spectra, and wrote the<br />

corresponding paragraphs in the paper. Schmitz designed and implemented the software to<br />

automate the assignment procedure, ran the calculations, and wrote the paragraphs ―The Program<br />

Possum‖ and the ―Automatic Assignments without EXSY Data‖. Park made the protein samples.<br />

Dixon coordinated the protein sample preparation and corrected aspects of the paper. Huber was<br />

responsible for the computational aspects of the project and the writing of corresponding sections<br />

of the paper, Otting coordinated the overall project and was responsible for the writing of the paper.


iii<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />

Biomol <strong>NMR</strong> 41:179-189<br />

Schmitz designed and implemented the software to automate the determination of the Δχ-<br />

tensor and wrote most of the paper except for the ―study case‖ section. Stanton-Cook was<br />

responsible of the calculation, the protein-protein modelling, the writing of the ―study case‖ and<br />

improvements of the paper. Su was responsible for improving the design of the software from an<br />

―end-user‖ perspective. Otting and Huber were responsible for the overall project and the writing of<br />

the manuscript.<br />

Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />

pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted.<br />

Schmitz designed and implemented the PCS-score into the software ROSETTA, collected<br />

experimental data sets, performed computations, and was responsible for writing the manuscript.<br />

Vernon guided the implementation of the PCS-score, ran calculations, interpreted the results, and<br />

improved the manuscript. Otting gathered experimental data sets, set up the overall project, and<br />

corrected versions of the manuscript. Baker was responsible for guiding the overall project, and for<br />

the overall manuscript. Huber designed the PCS-score, guided the overall project, and improved the<br />

paper.<br />

Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole<br />

No contributions by others.<br />

Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the<br />

Award of Another Degree<br />

None.<br />

Published Works by the Author Incorporated into the <strong>Thesis</strong><br />

John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />

stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />

Soc 129:13749-13757. Incorporated as Chapter 2.


iv<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />

Biomol <strong>NMR</strong> 41:179-189. Incorporated as Chapter 3.<br />

Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />

pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted. Incorporated<br />

as Chapter 4.<br />

Additional Published Works by the Author Relevant to the <strong>Thesis</strong><br />

but not Forming Part of it<br />

Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />

(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />

<strong>NMR</strong> spectroscopy. J Am Chem Soc 130:10486-10487


v<br />

Acknowledgements<br />

I would like to thank my advisors Dr Thomas and Prof. Gottfried for their scientific and<br />

moral support that made this thesis so enjoyable. In particular, I really appreciated their door<br />

constantly opened for discussions; their communicative scientific enthusiasm; their exemplary - if<br />

not legendary - efficiency; their honesty and encouragement in whatever I tried to accomplish, and<br />

of course their constant and reliable good humor.<br />

Thanks to the people who put me on track to postgraduate studies, in particular Denis<br />

Barthou for his amazing teaching, Philippe Pucheral and Luc Bouganim for introducing me to<br />

research, and of course Prof. Guido for somehow bringing me into the field of structural biology.<br />

Thanks to the past and present members / visitors of the BMMG / MD group for those 3.5<br />

years of fun, I appreciated their company whether it was for a discussion, ―une boue‖, a tea break, a<br />

beer or two, or three, a meal, a game of tennis / squash, or a dance, so thanks to Matt, Itamar,<br />

David, Mitchel, Ying, Zrinka, Daniela, Michael, Kim, Liz, Alpesh, Prof. Alan Mark and all the<br />

others.<br />

It has always been a pleasure to visit ANU in Canberra thanks to Michael, Xun-Cheng,<br />

Hiromasa, Kiyoshi, Karin and Laura.<br />

I also would like to thank Prof. David Baker and his lab for welcoming me for a couple of<br />

months for a fructiferous collaboration, and many many thanks to Robert for so much help with<br />

that project, and for being a great illustration of how friendly Canadian people are.<br />

My apologizes to my family for being away so far for so long, I know you understood my<br />

decision. Thanks for all your support. Thank you Chantel for your patience, love, support and<br />

patience.


vi<br />

Abstract<br />

Understanding biological phenomena at atomic resolution is one of the keys to modern drug<br />

design. In particular, knowledge of 3D structures of proteins and their interactions with other<br />

macromolecules are necessary for designing chemical compounds that modify biological processes.<br />

Conventional methods for protein structure determinations comprise X-ray crystallography and<br />

nuclear magnetic resonance (<strong>NMR</strong>) spectroscopy. These techniques can also determine the binding<br />

mode of chemical compounds. Either technique can be slow and costly, making it highly relevant<br />

to explore alternative strategies. Paramagnetic <strong>NMR</strong> spectroscopy is emerging as such an<br />

alternative technique. In order to measure the paramagnetic effects, two <strong>NMR</strong> spectra are compared<br />

that have been measured with and without a bound paramagnetic metal ion. In particular,<br />

pseudocontact shifts (PCS) of nuclear spins are easily measured as the difference (in ppm) of the<br />

chemical shifts between the two spectra. PCSs provide long range and orientation dependent<br />

restraints, allowing positioning of the spin with respect to the magnetic susceptibility tensor<br />

anisotropy (Δχ-tensor) of the metal ion.<br />

In this thesis, I used the PCS effect to computationally extract information from <strong>NMR</strong><br />

spectra. I developed (i) a tool (called Possum) to automatically assign diamagnetic and<br />

paramagnetic spectra of the methyl groups of amino acid side chains, given structural information<br />

of the protein studied and prior knowledge of the Δχ-tensor; (ii) I designed a comprehensive<br />

software package (called Numbat) to extract Δχ-tensor parameters from assigned PCS values and<br />

the available 3D structure; and (iii) I incorporated PCS-based restraints into the protein structure<br />

prediction software CS-ROSETTA and demonstrated that this combination (PCS-ROSETTA)<br />

presents a significant improvement for de novo structure determination. The three projects serve<br />

different purposes at different stages of protein <strong>NMR</strong> studies. They could be combined in the<br />

following manner: Starting from assigned backbone PCSs, PCS-Rosetta could be used to determine<br />

the 3D structure of the protein. Possum can then be used to automatically assign the <strong>NMR</strong><br />

resonances of the methyl groups using PCSs. Finally, Numbat can be used to fit improved Δχ-<br />

tensors to all the PCS data, analyze the quality of the Δχ-tensors and identify possible wrong<br />

assignments. Iterative repetition of this protocol would give a 3D structural model of the protein<br />

with a minimum of data. Alternatively, the Δχ-tensor parameters and PCSs could be used as input<br />

for a traditional software package such as Xplor-NIH to compute a 3D structure of the protein.


vii<br />

Keywords<br />

paramagnetic nmr, pseudocontact shift, lanthanide, magnetic susceptibility tensor, protein,<br />

structure determination, resonance assignment, protein folding<br />

Australian and New Zealand Standard <strong>Research</strong> Classifications<br />

(ANZSRC)<br />

060112 (40%), 080301 (30%), 030406 (30%)


viii<br />

Table of Contents<br />

Declaration by author ......................................................................................................................... ii<br />

Statement of Contributions to Jointly Authored Works Contained in the <strong>Thesis</strong> .............................. ii<br />

Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole ....................................................... iii<br />

Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the Award of Another Degree ............... iii<br />

Published Works by the Author Incorporated into the <strong>Thesis</strong> ........................................................... iii<br />

Additional Published Works by the Author Relevant to the <strong>Thesis</strong> but not Forming Part of it ........ iv<br />

Acknowledgements .............................................................................................................................. v<br />

Abstract .............................................................................................................................................. vi<br />

Keywords .......................................................................................................................................... vii<br />

Australian and New Zealand Standard <strong>Research</strong> Classifications (ANZSRC) .................................. vii<br />

Table of Contents ............................................................................................................................. viii<br />

List of Figures .................................................................................................................................... xi<br />

List of Tables ................................................................................................................................... xiv<br />

List of Abbreviations ......................................................................................................................... xv<br />

1. Introduction ................................................................................................................................ 1<br />

1.1 Liquid State Nuclear Magnetic Resonance .......................................................................... 1<br />

1.2 Paramagnetic <strong>NMR</strong> .............................................................................................................. 4<br />

1.2.1 The four paramagnetic effects in <strong>NMR</strong> ........................................................................ 4<br />

1.2.2 The pseudocontact shift as a restraint ........................................................................... 7<br />

1.3 Computational study of paramagnetic proteins .................................................................. 12<br />

1.3.1 The assignment problem ............................................................................................. 12<br />

1.3.2 The Δχ-tensor determination problem ........................................................................ 16<br />

1.3.3 De novo structure determination of proteins .............................................................. 19<br />

1.4 Scope of the thesis .............................................................................................................. 23<br />

1.5 References .......................................................................................................................... 24<br />

2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls ...................... 27<br />

2.1 Abstract .............................................................................................................................. 28<br />

2.2 Introduction ........................................................................................................................ 28<br />

2.3 Experimental section .......................................................................................................... 30<br />

2.3.1 Sample preparation ..................................................................................................... 30<br />

2.3.2 <strong>NMR</strong> spectroscopy ..................................................................................................... 30


ix<br />

2.3.3 Manual resonance assignments from PCS ................................................................. 32<br />

2.3.4 The program Possum .................................................................................................. 32<br />

2.4 Results ................................................................................................................................ 36<br />

2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes ................................................. 36<br />

2.4.2 Methyl CZ-EXSY experiments ................................................................................... 38<br />

2.4.3 Resonance assignment of Met, Ala and Thr methyl groups ....................................... 39<br />

2.4.4 Assignments of Val, Leu, and Ile methyl groups ....................................................... 41<br />

2.4.5 Automatic assignments without EXSY data .............................................................. 44<br />

2.4.6 PCS and flexibility ..................................................................................................... 46<br />

2.5 Discussion .......................................................................................................................... 48<br />

2.6 Acknowledgement .............................................................................................................. 50<br />

2.7 Supporting Information Available ..................................................................................... 50<br />

2.8 References .......................................................................................................................... 51<br />

2.9 Supporting information ...................................................................................................... 56<br />

3. Numbat: new user-friendly method built for automatic Δχ-tensor determination ................... 75<br />

3.1 Abstract .............................................................................................................................. 76<br />

3.2 Keywords ........................................................................................................................... 76<br />

3.3 Abbreviations ..................................................................................................................... 76<br />

3.4 Introduction ........................................................................................................................ 77<br />

3.5 Algorithm ........................................................................................................................... 78<br />

3.6 Program Features ................................................................................................................ 80<br />

3.6.1 GUI ............................................................................................................................. 80<br />

3.6.2 Input files .................................................................................................................... 81<br />

3.6.3 Methyl group definition .............................................................................................. 81<br />

3.6.4 Optimization of the tensor parameters ....................................................................... 81<br />

3.6.5 Residual Anisotropic Chemical Shifts (RACS) ......................................................... 82<br />

3.6.6 Multiple PCS data sets ................................................................................................ 82<br />

3.6.7 PCS modification ........................................................................................................ 83<br />

3.6.8 PCS selection .............................................................................................................. 83<br />

3.6.9 Conventions ................................................................................................................ 83<br />

3.6.10 Error analysis ............................................................................................................ 84<br />

3.6.11 Visualization ............................................................................................................. 85<br />

3.6.12 Output ....................................................................................................................... 86<br />

3.7 Study case ........................................................................................................................... 86


x<br />

3.7.1 Subunit ε186 ............................................................................................................... 87<br />

3.7.2 Subunit θ ..................................................................................................................... 89<br />

3.7.3 Modelling the complex between ε186 and θ .............................................................. 90<br />

3.8 Conclusion .......................................................................................................................... 92<br />

3.9 Acknowledgment ............................................................................................................... 93<br />

3.10 References ........................................................................................................................ 93<br />

3.11 Supporting information .................................................................................................... 97<br />

4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA ..................... 101<br />

4.1 Abstract ............................................................................................................................ 102<br />

4.2 Introduction ...................................................................................................................... 102<br />

4.3 Results .............................................................................................................................. 104<br />

4.3.1 Test set ...................................................................................................................... 104<br />

4.3.2 Capacity of the PCS Score to Identify Native-like Structures ................................. 105<br />

4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA ............................................. 106<br />

4.3.4 Successes and Limits of PCS-ROSETTA Calculations ........................................... 109<br />

4.4 Discussion ........................................................................................................................ 110<br />

4.5 Materials and Methods ..................................................................................................... 112<br />

4.5.1 PCS-ROSETTA Score. ............................................................................................. 112<br />

4.5.2 PCS-ROSETTA Algorithm ...................................................................................... 113<br />

4.5.3 Input for PCS-ROSETTA ......................................................................................... 113<br />

4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination ............................... 114<br />

4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring ........................ 115<br />

4.6 Acknowledgments ............................................................................................................ 115<br />

4.7 References ........................................................................................................................ 115<br />

4.8 Supporting information .................................................................................................... 118<br />

5. Conclusion and perspectives .................................................................................................. 129<br />

5.1 The use of PCS for structure determination ..................................................................... 130<br />

5.1.1 Folding of proteins using only pseudocontact shifts ................................................ 130<br />

5.1.2 Uses of multiple lanthanide binding sites ................................................................. 132<br />

5.1.3 Development of a new PCS-ROSETTA protocol .................................................... 133<br />

5.2 The use of PCS for chemical shift assignment ................................................................. 134<br />

5.3 The use of PCS for protein docking ................................................................................. 135<br />

5.4 References ........................................................................................................................ 136


xi<br />

List of Figures<br />

Figure 1.1 <strong>NMR</strong> effects used for structure determination .................................................................. 3<br />

Figure 1.2 Representation of the distance and angular dependence of the four paramagnetic effects<br />

for the spin S, or system of spin S1-S2 (green) ...................................................................... 5<br />

Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />

spectra .................................................................................................................................... 5<br />

Figure 1.4 The PCS is less sensitive than RDC to small discrepancies between X-ray and solution<br />

structure ................................................................................................................................. 7<br />

Figure 1.4 The Δχ-tensor determination problem ............................................................................... 8<br />

Figure 1.5 Illustration of the three approaches of resonance assignment ........................................... 8<br />

Figure 1.6 Illustration of PCS restraints ........................................................................................... 11<br />

Figure 1.7 Protein complexes determined using PCSs ..................................................................... 11<br />

Figure 1.8 Flow-chart of the Echidna algorithm ............................................................................... 14<br />

Figure 1.9 Examples of the MAP problem ....................................................................................... 15<br />

Figure 1.10 Illustration of the task performed by the software Possum ........................................... 16<br />

Figure 1.11 Isosurface shapes calculated by equation (1.1) ............................................................. 16<br />

Figure 1.12 Sanson-Flamsteed projection for visualization of Δχ-tensor uncertainty ...................... 17<br />

Figure 1.13 Sanson-Flamsteed representations of Δχ-tensor axes orientation ................................. 19<br />

Figure 1.14 Illustration of the task performed by the software Numbat ........................................... 19<br />

Figure 1.15 Effect of the mobility of the tag on the PCS ................................................................. 22<br />

Figure 2.1 Methyl CZ-EXSY experiments ........................................................................................ 31<br />

Figure 2.2 Formulation of the assignment problem depending on the information available .......... 35<br />

Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex (containing<br />

13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a) La 3+ /Dy 3+<br />

and (b) La 3+ /Yb 3+ (red) ........................................................................................................ 37<br />

Figure 2.4 Assignment of Met CH3 from PCS ................................................................................ 39<br />

Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />

stereospecific resonance assignments .................................................................................. 42<br />

Figure 2.6 Residues showing deviations between predicted and experimental PCS ........................ 47<br />

Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment used in this study ................. 56<br />

Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C .................................................................... 57


xii<br />

Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />

186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled<br />

glucose ................................................................................................................................. 58<br />

Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

containing 13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-<br />

TOCSY spectrum of the same sample (red) ........................................................................ 59<br />

Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />

methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile ............................. 60<br />

Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />

the cz- 186/ /Yb 3+ complex. ............................................................................................... 62<br />

Figure 3.1 Screenshots of Numbat main windows ........................................................................... 80<br />

Figure 3.2 Euler angle definitions used by Numbat ......................................................................... 84<br />

Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of its<br />

orientational uncertainty in a Sanson-Flamsteed projection plot ........................................ 85<br />

Figure 3.4 The four degenerate solutions arising from the symmetry of the Δχ-tensor around the x,<br />

y and z axes ......................................................................................................................... 92<br />

Figure 3.5 The complex between ε186 and θ determined by superimposition of Δχ-tensors .......... 92<br />

Figure 4.1 Fold identification by pseudocontact shifts ................................................................... 106<br />

Figure 4.2 Improved conformational sampling by PCS-ROSETTA .............................................. 108<br />

Figure 4.3 Energy landscapes generated by PCS-ROSETTA ........................................................ 108<br />

Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />

structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures<br />

(red) ................................................................................................................................... 110<br />

Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy ................... 119<br />

Figure S4.2 Improved fragment assembly by PCS-ROSETTA ..................................................... 120<br />

Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />

ROSETTA energies and C α rmsd values being calculated using only the core residues as<br />

defined in Table S4.1 ......................................................................................................... 121<br />

Figure S4.4 Identification of successful calculations with PCS-ROSETTA .................................. 122<br />

Figure S4.5 Flow diagram of PCS-ROSETTA ............................................................................... 123<br />

Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA .. 124<br />

Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the protein ....................... 131<br />

Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide fragments<br />

in the protein structure ....................................................................................................... 131


xiii


xiv<br />

List of Tables<br />

Table 2.1 Automatic assignment of methyl groups by the program Possum a ................................. 45<br />

Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />

complexes used in this study a ............................................................................................. 64<br />

Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />

using the program Possum a ................................................................................................ 69<br />

Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />

using the program Possum with methyl connectivity information in the Yb 3+ complex a .. 71<br />

Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />

of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+<br />

complex a ............................................................................................................................. 73<br />

Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule .................. 87<br />

Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a ............................... 89<br />

Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a 97<br />

Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />

conformers b of the <strong>NMR</strong> structure of . ............................................................................. 99<br />

Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA ....................... 104<br />

Table S4.1 PCS data information and grid search parameters used. .............................................. 125<br />

Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA. .................... 126


xv<br />

List of Abbreviations<br />

α Subunit α of the E. coli polymerase III<br />

ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />

θ Subunit θ of the E. coli polymerase III<br />

CCR Cross Correlated Relaxation<br />

CSA Chemical Shielding Anisotropy<br />

GUI Graphical User Interface<br />

HOT The bacteriophage P1-encoded homolog of θ<br />

MAP Multi-dimensional Assignment Problem<br />

<strong>NMR</strong> Nuclear Magnetic Resonances<br />

NOE Nuclear Overhauser Effect<br />

PCS Pseudocontact Shift<br />

ppm parts per million<br />

PRE Paramagnetic Relaxation Enhancement<br />

RACS Residual Anisotropic Chemical Shift<br />

RDC Residual Dipolar Coupling<br />

RMSD Root Mean Square Deviation<br />

UTR Unique Δχ-Tensor Representation


Chapter 1<br />

Introduction<br />

1. Introduction<br />

1.1 Liquid State Nuclear Magnetic Resonance<br />

In the last few decades Nuclear Magnetic Resonance (<strong>NMR</strong>) has been used routinely to<br />

investigate chemical compounds, proteins and complexes. The method relies on intrinsic spin<br />

properties of nuclei. Spins are first exposed to a strong and constant magnetic field delivered by the<br />

spectrometer. Then, they are excited by a radiofrequency pulse sequence. The precession of the<br />

spins is recorded during the free induction decay of the <strong>NMR</strong> experiment and converted into a<br />

frequency spectrum after Fourier transformation. Several parameters can be read from the spectra<br />

which provide information about the structure of the molecule.<br />

The chemical shift describes the dependence of nuclear magnetic energy levels on the<br />

electronic environment in a molecule. The chemical shift depends on the nature of the nucleus. It<br />

also strongly depends on its local neighborhood (up to 5 Å) due to the influence of the electron<br />

configuration. Hence, in a protein, almost all nuclei have different chemical shifts. This allows to<br />

distinguish them in the <strong>NMR</strong> spectrum by their specific resonance frequency.<br />

The dipole-dipole coupling is the direct magnetic interaction between two close spins. The<br />

effect is intra- and intermolecular, since it acts through space. The interaction energy is minimal<br />

when the two spins are aligned. This spin interaction is responsible for the Nuclear Overhauser<br />

Effect (NOE).


2 Chapter 1. Introduction.<br />

The scalar J-coupling results from an indirect magnetic interaction of two nuclear spins<br />

via their surrounding electrons. The effect is exclusively intramolecular because it is propagated<br />

through the bonds between two nuclei. Typically, it can be measured for nuclei separated by up to<br />

three bonds. In this case, it is referred as 3 J coupling. The 3 J-coupling constant yields angle<br />

information, as shown in Figure 1.1.b.<br />

<strong>NMR</strong> experiments measure the effects described above. One can classify <strong>NMR</strong> experiments<br />

in two groups: Those that yield structural information, and those that yield information to facilitate<br />

resonance assignment. The structural information comes mainly from direct dipole-dipole<br />

couplings providing short-range distance restraints between two spins, and from 3 J-couplings which<br />

offer dihedral angle restraints between the three bonds concerned. Some examples of <strong>NMR</strong><br />

experiments that offer structural information are:<br />

1D proton experiment: It provides the chemical shifts of the protons. Each 1 H has a<br />

different chemical shift, and each corresponding signal may be split into multiplets due to scalar<br />

couplings. The spectrum gets more complex as the number of spins increases. To reduce spectral<br />

overlap, two- or multi-dimensional <strong>NMR</strong> spectra can be recorded.<br />

1D carbon experiment is equivalent to the 1D proton experiment, but measured on carbon.<br />

Only 1% of natural carbon is 13 C and often the protein has to be 13 C labeled in order to observe<br />

carbon chemical shifts because the natural isotope of carbon ( 12 C) has no nuclear spin.<br />

NOESY: This experiment correlates spins that are separated in space by a distance of up to<br />

6 Å. The NOE observed in the NOESY experiment is based on the direct dipole-dipole coupling<br />

and provides valuable inter-spin distance information for the structure determination of proteins.<br />

NOE restraints are measured from the peak intensity in the NOESY experiment and provide<br />

distance information (Figure 1.1.a). As the effect is through-space and independent of chemical<br />

bonds, it is also useful for investigations of protein-ligand and protein-protein interactions.<br />

A major task and challenge of structure determination is to assign resonances to their<br />

corresponding atoms in order to apply experimental restraints to the correct set of atoms.<br />

Additional correlation <strong>NMR</strong> experiments are routinely recorded to assist in the chemical shift<br />

assignment task. These include:<br />

2D 15 N-HSQC experiments correlate protons with nitrogens of a 15 N labeled protein. These<br />

correlations simplify the analysis of a 1D spectrum since the additional dimension allows the


1.1 Liquid State Nuclear Magnetic Resonance. 3<br />

separation of the resonances into cross-peaks observed in a 2D plane. The recording time of the<br />

experiment is however longer.<br />

3D 13 C- 15 N-correlation experiments are 3-dimensional heteronuclear experiments that<br />

correlate C, N and H atoms. The resulting spectrum is in particular beneficial for large proteins,<br />

because resonance overlap is reduced in 3D. The disadvantage of these experiments is, however,<br />

that they are less sensitive than the 15 N-HSQC experiment and usually require that the protein is 13 C<br />

and 15 N double labeled.<br />

COSY experiments correlate 1 H spins via scalar couplings. They are used to identify groups<br />

of spins connected by less than four bonds (spin systems).<br />

TOCSY experiments correlate 1 H resonances that belong to the same spin-system, where<br />

pairs of spins are separated by no more than three bonds. TOCSY spectra include the COSY<br />

information, and are used to identify connected spin systems. TOCSY spectra are useful for<br />

sequential resonance assignment.<br />

NOESY experiments can also be used in the assignment procedure. COSY and TOCSY<br />

experiment should provide the amino acid type information of the resonances, whereas the NOESY<br />

experiment allows the sequential piecing together of the assignment by exploiting the distance<br />

dependency of the NOE effect.<br />

Figure 1.1 <strong>NMR</strong> effects used for structure determination. (a) The NOE effect<br />

provides distance information (up to 0.6 nm) between two protons. The intensity of<br />

the NOE signal is proportional to 1/d 6 , where d is the interproton distance. (b) The<br />

3 J-coupling gives dihedral angle restraints. The relationship between the angle Φ<br />

and the 3 J-coupling is given by the Karplus equation (Karplus, 1959) and the<br />

allowed values for Φ is illustrated by the plot 3 J = f(Φ). Figures adapted from web<br />

resources.


4 Chapter 1. Introduction.<br />

In contrast to NOE or 3 J coupling effects that are short-range (measureable for distance<br />

below 6 Å) and local (each measurement concerns an independent group of atoms), paramagnetic<br />

<strong>NMR</strong> introduces new effects that are long-range (measured up to 40 Å), and global (i.e. their effect<br />

is described for all spins in a common frame centered on the paramagnetic lanthanide).<br />

1.2 Paramagnetic <strong>NMR</strong><br />

1.2.1 The four paramagnetic effects in <strong>NMR</strong><br />

When a paramagnetic centre with unpaired electrons, such as a lanthanide ion, is present in<br />

a protein, the observed <strong>NMR</strong> spectrum changes due to induced paramagnetic effects. By<br />

comparison of the diamagnetic and paramagnetic spectra, one can observe the following four<br />

paramagnetic effects:<br />

The pseudocontact shift (PCS): It is given by equation (1.1), where the spin of interest is<br />

described by its polar coordinate in an internal frame (the Δχ-tensor frame) centered on the<br />

paramagnetic center (Figure 1.2.a).<br />

(1.1)<br />

Δχax = χz – (χz + χy)/2 and Δχrh = (χx - χy) are respectively the axial and rhombic component<br />

that describe the anisotropic effect; r, θ and θ are the polar coordinate of the spin in the Δχ-tensor<br />

frame (Figure 1.2.a). The PCS is a long range effect (up to 40 Å) which decays with 1/r 3 , and is<br />

measured as the difference between the paramagnetic and diamagnetic chemical shift (Figure 1.3).<br />

The residual dipolar coupling (RDC): With an attached paramagnetic lanthanide, a<br />

protein weakly aligns with respect to the magnetic field. RDCs are manifested as increases or<br />

decreases of the magnitudes of multiplet splittings that can be observed in undecoupled <strong>NMR</strong><br />

spectra (Figure 1.3). The RDC can also be back-calculated (equation (1.2)) provided that the<br />

orientation of the two spins with respect to the alignment tensor is known (Figure 1.2.b).<br />

with:<br />

(1.2)


1.2 Paramagnetic <strong>NMR</strong>. 5<br />

(1.3)<br />

B0 is the magnetic field, γH and γN are proton and nitrogen magnetogyric ratios, ћ Planck’s<br />

constant divided by 2π, S the order parameter of the molecular alignment, rNH the N-H distance, kB<br />

the Boltzmann constant, and T the absolute temperature.<br />

Figure 1.2 Representation of the distance and angular dependence of the four<br />

paramagnetic effects for the spin S, or system of spin S1-S2 (green). (a) PCSs and (b)<br />

RDCs are described in the χ-tensor frame centered on the lanthanide l (red). (c)<br />

PREs only yield distance dependence while (d) CCRs also yield angle dependence.<br />

Adapted from (Pintacuda et al., 2004).<br />

Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />

spectra. The figure shows the diamagnetic and paramagnetic antiphase doublets. PCS is measured


6 Chapter 1. Introduction.<br />

as the chemical shift difference. RDC is measured as the difference in line splitting. PRE and CCR<br />

can be determined from the differential line broadening.<br />

The paramagnetic relaxation enhancement (PRE): The PRE yields distance information<br />

between the paramagnetic lanthanide and the spin of interest (Figure 1.2.c). It depends on the<br />

distance r between the paramagnetic center and the nuclear spin with 1/r 6 (equation (1.4)) and<br />

accounts for the difference of line broadening between the paramagnetic and diamagnetic chemical<br />

shift (Figure 1.3).<br />

with:<br />

(1.4)<br />

(1.5)<br />

where ηr is the rotational correlation time, ωH the Larmor frequency of the proton, μ0 the<br />

vacuum permeability, gJ the g-factor, μB the Bohr magneton, and J the total spin moment.<br />

The cross correlated relaxation (CCR): This effect is also measured by the observed line<br />

broadening; more precisely by comparing the width between the two components of the antiphase<br />

doublet (Figure 1.3). This effect combines distance and angle dependence (equation (1.6) and<br />

Figure 1.2.d).<br />

with:<br />

(1.6)<br />

(1.7)<br />

All four paramagnetic effects can be used to study protein structure. Residual dipolar<br />

coupling has been widely used to help determining protein structures (Rohl et al., 2002), but when<br />

carefully comparing RDCs measured by <strong>NMR</strong> with RDCs predicted from a crystal structure, it was<br />

observed that small discrepancies between the N-H bond orientation in crystal and liquid state can


1.2 Paramagnetic <strong>NMR</strong>. 7<br />

lead to large deviations between measurement and prediction. PCSs are less sensitive to the<br />

difference between the crystal model and the solution structure. The focus in my PhD has been on<br />

using PCS to study proteins and their interactions.<br />

Figure 1.4 The PCS is less sensitive than RDC to small discrepancies<br />

between X-ray and solution structure. A large change in the orientation<br />

of the N-H vector will considerably affect the calculation of the RDC<br />

(equation (1.2) and Figure 1.2.b). On the other hand, the PCS will be<br />

less affected as the relative position of the hydrogen with respect to the<br />

tensor frame is almost unchanged when PCS are measurable (d > 10 Å).<br />

1.2.2 The pseudocontact shift as a restraint<br />

PCSs can be calculated using equation (1.1) if the magnitude (two parameters: Δχax and<br />

Δχrh, Figure 1.5.a), location (three Cartesian coordinates x, y and z, Figure 1.5.b) and orientation<br />

(three Euler angles α, β and γ, Figure 1.5.c) of the Δχ-tensor are known, and if a structure is<br />

available. The paramagnetic chemical shift of a spin located close to the paramagnetic center is<br />

broadened beyond detection due to PRE, and consequently, its PCS cannot be observed. The cutoff<br />

radius is typically about 10 Å.<br />

The <strong>NMR</strong> resonances have first to be assigned in order to measure PCSs. Three kinds of<br />

assignment have to be distinguished (Figure 1.6):<br />

(i) The assignment of the diamagnetic <strong>NMR</strong> spectrum: it is routinely performed with<br />

conventional sequential assignment methods using one or a combination of COSY,<br />

TOCSY, NOESY <strong>NMR</strong> and triple resonance experiments.


8 Chapter 1. Introduction.<br />

Figure 1.5 The Δχ-tensor determination problem. The Δχ-tensor can be conveniently represented by<br />

isosurfaces. For a given ppm value p, all spins that have a PCS value equal to p would be located<br />

on a given isosurface (red for negative PCS, blue for positive PCS). (a) The Δχax and Δχrh<br />

parameters that have to be determined are responsible for the shape and size of the isosurfaces. (b)<br />

The location of the paramagnetic center is described by three Cartesian coordinates. (c) The three<br />

Euler angles α, β and γ relate the orientation of the Δχ-tensor to the protein frame.<br />

Figure 1.6 Illustration of the three approaches of resonance assignment. (a) The peaks of the<br />

diamagnetic spectrum (blue) are assigned to their corresponding amino acids. (b) The<br />

paramagnetic cross peaks (red) are assigned to their corresponding residues. (c) When the pairing


1.2 Paramagnetic <strong>NMR</strong>. 9<br />

between diamagnetic and paramagnetic cross peak is already determined by a transfer experiment,<br />

the pairs of cross peaks can be assigned to their corresponding residues.<br />

(ii) The assignment of the paramagnetic <strong>NMR</strong> spectrum: sequential approaches are not<br />

suitable because the lanthanide induces PRE effects resulting in large line<br />

broadening for residues close to the paramagnetic center. Proposed experimental<br />

approaches use temperature dependence (Nguyen et al., 1999), magnetic field<br />

dependence (Bertini et al., 1998) or fast / slow exchange of the lanthanide (John et<br />

al., 2007) to transfer the chemical shift from the diamagnetic to the paramagnetic<br />

state.<br />

(iii) The assignment of the pseudocontact shift: One can pair the chemical shifts of<br />

diamagnetic and paramagnetic resonances when a transfer experiment can be<br />

performed, but the assignment to the individual atoms is still unknown. There is no<br />

direct method to assign experimental PCSs to the structure; the only way is to<br />

compare experimental and predicted PCSs to find the best match.<br />

Comparison between a calculated and measured PCS provides a restraint (Figure 1.7) that<br />

has been used for different purposes in the literature:<br />

Structure refinement: Allegrozzi et al. (Allegrozzi et al., 2000) showed how using PCS as<br />

restraints in structure refinement improved the quality of structures of calbindin. They compared<br />

the original <strong>NMR</strong> structures obtained from 1539 NOEs with structures refined using additional<br />

PCS restraints from three different lanthanides. The magnitude of the paramagnetic dipole moment<br />

differs between different paramagnetic metal ions. Hence, the cutoff radius and the distance for<br />

which PCSs can still be observed is lanthanide-dependent. The three lanthanides chosen in this<br />

study (Ce 3+ , Yb 3+ and Dy 3+ ) cover different regions of the 3D space in shells of 5-15 Å for Ce 3+ , 9-<br />

25 Å for Yb 3+ , and 13-40 Å for Dy 3+ . Each lanthanide used focuses on a different region of the<br />

protein, and provides additional independent information to the NOEs. The resulting ensemble of<br />

structures generated with each PCS data set separately shows better definition of the backbone in<br />

the area covered by the lanthanide used. In particular, the residues 56-59 have an RMSD above 1.5<br />

Å in the family of <strong>NMR</strong> structures. Inclusion of PCS restraints decreases the RMSD value to 0.75<br />

Å. What this study failed to show is the improvement in structure quality when all data (from all<br />

different lanthanides) are used simultaneously in the refinement procedure.


10 Chapter 1. Introduction.<br />

Protein-ligand interaction: John et al. (John et al., 2006) showed how PCS restraints can<br />

be used to determine the structure of protein-ligand complexes. The approach is of major interest<br />

for drug screening. In a first step, the Δχ-tensor is determined for the target protein. This is the most<br />

complicated task because the protein can be large, making the assignment of chemical shifts<br />

difficult. However, in the context of drug screening this step needs to be performed only once. Each<br />

ligand that is being screened is isotopically labeled, and a paramagnetic spectrum of the ligand in<br />

complex with the protein is recorded. The assignment of a ligand spectrum and the corresponding<br />

Δχ-tensor are swiftly and easily obtained. The two Δχ-tensors from ligand and protein being the<br />

same by definition, a simple superimposition of them leads to the rigid body structure of the<br />

complex. A molecular dynamics package is further used to locally refine the conformation of the<br />

protein on the contact surface. John and coworkers demonstrated the approach with the thymidine<br />

nucleotide as the ligand binding to the ε subunit of DNA polymerase III. The determined thymidine<br />

structure was found to have a very similar binding mode compared to the thymidine<br />

monophosphate present in the reference crystal structure.<br />

Protein-protein interaction: Pintacuda et al. (Pintacuda et al., 2006) described a protocol<br />

to solve the structure of a protein-protein complex using only PCSs and illustrated the protocol on<br />

the example of the N-terminal domain of the subunit ε and subunit θ of the E. coli DNA<br />

polymerase III. Again the method relies on the determination of the Δχ-tensors, first relative to one<br />

molecule (ε, Figure 1.8.a) and then relative to the second molecule (θ, Figure 1.8.b), followed by<br />

the superimposition of the Δχ-tensor frames (Figure 1.8.c). Such an approach is particularly<br />

relevant considering the difficulty to co-crystallize proteins in complexes, compared to the<br />

crystallization of the components separately. An alternative has been to use RDCs (McCoy et al.,<br />

2002), but the relative orientation and location of the two rigid bodies obtained with PCSs is more<br />

accurate compared to what would result from a complex build from RDC data, since PCSs yield<br />

simultaneously orientation and distance information, while RDC data lack distance information and<br />

are sensitive to small fluctuations of NH bond orientation. The resulting rigid-body docked<br />

complex could exhibit sterical clashes that would need to be resolved with a molecular refinement<br />

package. The final result is valuable as input for docking refinement software considering that the<br />

most difficult part of docking two molecules in a complex is to obtain with confidence the<br />

approximate binding sites.


1.2 Paramagnetic <strong>NMR</strong>. 11<br />

Figure 1.7 Illustration of PCS restraints. If the Δχ-tensor parameters are fully determined, one can<br />

accurately predict PCS values. The assignment of both diamagnetic and paramagnetic spectra<br />

provides experimental PCSs. Direct comparison of both offers a PCS-based restraint.<br />

Figure 1.8 Protein complexes determined using PCSs. Paramagnetic <strong>NMR</strong><br />

experiments are performed on the complex, (a) firstly with only the first protein<br />

labeled, (b) secondly with only the second protein labeled. The two Δχ-tensors are<br />

fitted separately, according to the experimental PCSs. The two Δχ-tensors are<br />

theoretically the same, their superimposition provides the structure of the complex<br />

(c). Adapted from (Pintacuda et al., 2006).


12 Chapter 1. Introduction.<br />

1.3 Computational study of paramagnetic proteins<br />

1.3.1 The assignment problem<br />

Assigning the resonances of <strong>NMR</strong> spectra is a necessary step towards applying <strong>NMR</strong><br />

restraints for protein computation. Although several software packages are capable of predicting<br />

chemical shifts, they require high-resolution 3D structures and lack accuracy especially for the<br />

unstructured parts of a protein (Shen et al., 2007). Their algorithms are not based on pure<br />

calculation from the 3D structure, but rather use statistical information extracted from the pdb data<br />

bank and from deposited chemical shifts.<br />

Pseudocontact shifts can be accurately predicted using equation (1.1). Consequently, it is<br />

possible to compare measured and calculated PCSs. The root mean square deviation between the<br />

calculated and the measured PCSs provides a score to minimize in order to yield the best possible<br />

assignment. This strategy has been applied to simultaneously assign measured PCSs and optimize<br />

the Δχ-tensor by a software package named ―Platypus‖ (Pintacuda et al., 2004). In this work, the<br />

protein was selectively labeled by residue type in order to simplify spectra. This led to the<br />

measurement of unassigned PCSs by unambiguous identification of the connectivity between a<br />

diamagnetic cross peak and its paramagnetic partner shifted along the diagonal in a 2D 15 N-HSQC<br />

spectrum. The diagonal shift is explained by the fact that in first approximation, hydrogen and<br />

nitrogen of an NH group have similar PCS values. This is due to the short distance between N and<br />

H atoms (approximately 1 Å) compared to the large distance (at least 10 Å) separating the NH bond<br />

from the lanthanide inducing the observed PCS. As a result, the polar coordinates within the Δχ-<br />

tensor frame are similar for the N and H spins and hence, both spins experience similar<br />

pseudocontact shifts. The second step of the protocol consists of combining a grid search over the<br />

Δχ-tensor parameters with an optimal assignment algorithm called the Hungarian method (Kuhn,<br />

1955): The grid search covers a large ensemble of possible combinations for the Δχ-tensor<br />

parameters. At each node of the grid search, it becomes possible to use the Hungarian method to<br />

obtain the optimal assignment in a polynomial time. A score can be calculated over each node to<br />

reflect the quality of the assignment, and compared to other nodes to extract the best assignment<br />

along with the best set of Δχ-tensor parameters.<br />

The ―diagonal rule‖ used in (Pintacuda et al., 2004) to manually measure PCSs has also<br />

been exploited in (Schmitz et al., 2006) to automatically assign paramagnetic chemical shifts of a<br />

full 15 N-HSQC spectrum, given a known 3D structure and the list of assigned diamagnetic


1.3 Computational study of paramagnetic proteins. 13<br />

resonances. The software Echidna was developed to overcome the difficulties of sequential<br />

assignment of cross peaks in a paramagnetic spectrum. It works as follow:<br />

Firstly, a small number n1 of paramagnetic peaks are paired with diamagnetic peaks by<br />

automatically screening unambiguous possibilities along the diagonal of a 2D 15 N-HSQC<br />

spectrum.<br />

Secondly, a Δχ-tensor is calculated to minimize the root mean square deviations between the n1<br />

experimental and calculated PCSs.<br />

Thirdly, the Δχ-tensor is used to predict for each diamagnetic cross peak the area of the<br />

spectrum where the paramagnetic partner is expected. This area is centered on the back-<br />

calculated PCS value, and defines a much smaller zone than the diagonal strip used in the first<br />

step. More paramagnetic peaks are unambiguously assigned.<br />

The two last steps are iterated until convergence. A final assignment is performed in order to<br />

yield the overall best assignment of all cross peaks after convergence of the method. This<br />

assignment uses the Hungarian method which finds in a polynomial time the optimal<br />

assignment among the n! possibilities, with n being the number of peaks to assign. A complete<br />

flow chart of the method is given in Figure 1.9.


14 Chapter 1. Introduction.<br />

Figure 1.9 Flow-chart of the Echidna algorithm.<br />

Both these two automatic <strong>NMR</strong> assignment techniques require partial initial assignments. In<br />

the case of Echidna, the whole diamagnetic spectrum needs to be assigned, while Platypus required<br />

the connectivity between the paramagnetic and the diamagnetic cross peak. In both cases, those<br />

prerequisites reduced the computational problem to a 2D Multi-dimensional Assignment Problem<br />

(MAP, Figure 1.10.a). A 2D-MAP is easily solved with the Hungarian method. More challenging<br />

and attractive would be to shortcut any initial manual assignment and start directly from the<br />

chemical shift lists of the diamagnetic and paramagnetic states. Such a method could be applied to<br />

automate the side chain assignment of a protein, once the preliminary and easier task of assigning<br />

the backbone chemical shifts and determining the Δχ-tensor has been done, for example with


1.3 Computational study of paramagnetic proteins. 15<br />

Echidna. Computationally, the problem becomes a 3D-MAP (Figure 1.10.b) with the number of<br />

possibilities increasing by (n!) 2 . This problem can no longer be solved in a polynomial time since a<br />

MAP problem of dimension larger or equal to three is proven to be NP-hard 1 (Karp, 1972). Instead,<br />

a heuristic method has to be used to find a good assignment, but without any guarantee of reaching<br />

the optimal assignment. An approach that tries to cope with the computational challenges (by<br />

means of additional experimental information) led to the development of a software dubbed<br />

Possum (Figure 1.11), which is described in Chapter 2.<br />

Figure 1.10 Examples of the MAP problem. (a) When calculated and predicted PCSs have to be<br />

matched, the cost function c(i, j) can be defined as the square deviation of the two values. The aim<br />

is to minimize the sum Q. The binary variables xi,j are defined to ensure that each element i and j is<br />

chosen exactly once. (b) The experimental PCSs are not directly available, but the paramagnetic<br />

and chemical shifts are measured. Their differences give possible experimental PCSs.<br />

1 The time required to solve a NP-Hard problem drastically increases with the size of the problem<br />

which is, in this context, the number n of residue to assign. Some problems with low n value (< 12)<br />

could be easily solved in a few minutes. The same problem, with just one extra peak to assign,<br />

remained unsolved after days of calculation. This illustrates that, independently of the algorithm<br />

used, NP-Hard problem becomes practically and suddenly insolvable for a given size n.which is, in<br />

this context, the number n of residue to assign. Some problems with low n value (< 12) could be<br />

easily solved in a few minutes. The same problem, with just one extra peak to assign, remained<br />

unsolved after days of calculation. This illustrates that, independently of the algorithm used, NP-<br />

Hard problem becomes practically and suddenly insolvable for a given size n.


16 Chapter 1. Introduction.<br />

Figure 1.11 Illustration of the task performed by the software Possum.<br />

The software package requires the Δχ-tensor parameters to perform a<br />

structure based automatic assignment of the <strong>NMR</strong> resonances.<br />

1.3.2 The Δχ-tensor determination problem<br />

The Δχ-tensor determination problem consists of obtaining the eight parameters<br />

characterizing the Δχ-tensor such that the discrepancy between the observed and calculated PCS is<br />

minimal. These comprise the determination of the paramagnetic center location, of the Δχ-tensor<br />

orientation, and of the axial and rhombic component. The last two parameters characterize the<br />

shape of PCS effects that ranges between two extremes as illustrated in Figure 1.12.<br />

Figure 1.12 Isosurface shapes calculated by equation (1.1). The positive isolevel is shown in blue,<br />

the negative is shown in red. Both isolevels have the same absolute value. (a) When the rhombic<br />

component is equal to zero, the isosurface is axially symmetric (dotted line). (b) Isosurface when


1.3 Computational study of paramagnetic proteins. 17<br />

the axial and rhombic components are equal. Three planar symmetries remain for the three planes<br />

orthogonal to the three main axes. (c) For an axial value of zero, the isosurface presents two<br />

additional planar symmetries due to an ambiguous main axis. While the axial and rhombic value<br />

“decide” the isosurface shape, equation (1.1) constrains any isosurfaces to be shaped between the<br />

two extremes shown in (a) and (c).<br />

Figure 1.13 Sanson-Flamsteed projection for visualization of Δχ-<br />

tensor uncertainty. The axes of the Δχ-tensor are projected on a 2D<br />

surface. The uncertainty of the axes orientation can be reflected by the<br />

size and shape of the colored area used for each axis, as done in<br />

Figure 1.14.<br />

The Δχ-tensor is similar to the alignment tensor used to compute RDCs. The alignment<br />

tensor parameters are easily determined by singular value decomposition, as equation (1.2) is linear<br />

with respect to the alignment tensor parameters. The axial and rhombic component of the alignment<br />

tensor can also be estimated without the requirement of protein coordinates, by exploiting the<br />

isotropic distribution of the NH bond orientation in space (Clore et al., 1998). The equation that<br />

governs the PCS is non-linear because no assumption of the isotropic distribution of PCS values<br />

can be made. Consequently, the way to determine the Δχ-tensor relies on a minimization of a cost<br />

comparing predicted and experimental PCS. A few existing computational software packages could<br />

be used to get the Δχ-tensor parameters, such as Fantasian (Banci et al., 1997), or Xplor-NIH<br />

(Schwieters et al., 2003, Schwieters et al., 2006). However, they can be cumbersome to use and<br />

have poor interactivity with the user.<br />

Another important feature that existing approaches fail to provide is the possibility to<br />

estimate the quality of the fit and directly visualize it in a Sanson-Flamsteed representation


18 Chapter 1. Introduction.<br />

(Bugayevskiy et al., 1995) as shown in Figure 1.13. A Sanson-Flamsteed plot is a projection of a<br />

sphere on a plane, and is commonly used in geography to project the globe on a map. The axes of<br />

the Δχ-tensor penetrate the surface of a unitary sphere, and the penetration points can be identified<br />

on the projection. This representation is convenient to highlight how reliable the fit of a Δχ-tensor<br />

is in the context of the data and structure used for the calculation. For example, the Δχ-tensor found<br />

for the ε subunit of DNA polymerase III is particularly well defined (Figure 1.14.a), while the<br />

Sanson-Flamsteed plot corresponding to the fit for the θ subunit (in complex with ε) reveals more<br />

uncertainties (Figure 1.14.b). Those differences are mainly due to the large distance (>15 Å) that<br />

separates θ from the lanthanide bound to ε.<br />

When docking a small molecule compound such as a drug to a protein, it is likely that the<br />

Sanson-Flamsteed plot highlights large uncertainties because only a small number of PCSs can be<br />

measured. To improve the situation, an important and desired feature would be to use the<br />

information of the protein’s Δχ-tensor in order to improve the fit of the drug’s Δχ-tensor. The<br />

resulting enhancements are illustrated in Figure 1.14.c (compared to Figure 1.14.b) for the ε/θ<br />

protein-protein complex and are expected to be similar for small ligand-protein complexes.<br />

In order to address the presented issues, it is required to have an efficient software package<br />

to work with the Δχ-tensor. Chapter 3 presents the software package ―Numbat‖ (Figure 1.15) that<br />

tries to meet all those needs.


1.3 Computational study of paramagnetic proteins. 19<br />

Figure 1.14 Sanson-Flamsteed representations of Δχ-tensor axes orientation. The error analysis<br />

used one thousand Monte-Carlo iterations that randomly selected 50% of the PCS data set. (a) The<br />

Δχ-tensors fitted for ε are very well defined. (b) As the lanthanide is 15 Å away from the θ subunit,<br />

the fitted Δχ-tensors are less accurate, as indicated by the large area each axis can spawn. (c)<br />

When keeping the relative orientation and magnitude of the two Δχ-tensors fixed (to the value<br />

determined for ε), the quality of the Δχ-tensor fitted increases, resulting in a more reliable complex<br />

of the two subunits. The well defined z-axis-area of the two Δχ-tensors (blue and brown) in (c)<br />

illustrates the reduced uncertainty around the z axes.<br />

Figure 1.15 Illustration of the task performed by the software<br />

Numbat. Given assigned PCSs, Numbat performs a structure<br />

based determination of the Δχ-tensor.<br />

1.3.3 De novo structure determination of proteins


20 Chapter 1. Introduction.<br />

The determination of protein structures is one of the main challenges of the post genomic<br />

era. The knowledge of structures at atomic detail is a prerequisite to understand how<br />

macromolecular complexes assemble and perform their tasks within living organisms. The<br />

established methods of X-ray crystallography and <strong>NMR</strong> spectroscopy still require significant<br />

human and financial resources to determine the structure of proteins of interest. Efforts are being<br />

focused on high-throughput methods to speed up the process of characterizing a large number of<br />

proteins (Kobe et al., 2008).<br />

De novo structure prediction software packages such as ROSETTA (Simons et al., 1997)<br />

are quite successful for small proteins (< 100 residues). The large size of the conformational space<br />

to explore makes it difficult, however, to tackle larger proteins. To overcome the ―sampling<br />

problem‖, one approach is to include additional experimental restraints that facilitate the three-<br />

dimensional reconstruction of protein structures. Those restraints must be easier to measure than it<br />

would be to obtain crystals of the target protein, or to measure and assign the NOEs required for the<br />

full determination of the structure.<br />

The pseudocontact shift effect is a candidate for this approach. PCSs can be measured<br />

swiftly and accurately as the chemical shift difference between two spectra, once a paramagnetic<br />

probe has been introduced into the protein. The use of lanthanide binding tags makes these<br />

techniques potentially available to any protein. Several lanthanide tags are now available. For a<br />

recent review, see (Su et al., 2009b). While it is not yet routine to attach lanthanide binding tags to<br />

a protein, several options are possible. Attachment by one or two disulfide bonds (Smith et al.,<br />

1975), attachment at one of the termini of the protein (Donaldson et al., 2001), or even use of a<br />

non-covalent tag as demonstrated by (Su et al., 2009a) can be considered. It is expected that<br />

lanthanide attachment techniques will become routine in the future.<br />

Beyond the process of attachment, the second challenge is to have a tag that is not flexible.<br />

The physical model underlying equation (1.1) is accurate if the Δχ-tensor parameters are constant<br />

over time. This hypothesis could be questioned if small movements of the tag occur. Fluctuation of<br />

the tag produces two undesired effects:<br />

(i) It changes the electronic environment in the vicinity of the lanthanide and<br />

consequently, the orientation or magnitude of the Δχ-tensor. As equation (1.1) is<br />

linear with respect to the axial component, the rhombic component, and the three<br />

Euler angles, changes over time of those five parameters will not affect the way<br />

PCSs are predicted. More precisely, n conformations of the Δχ-tensor occurring


1.3 Computational study of paramagnetic proteins. 21<br />

within the protein (and sharing the same center coordinate) can be equally explained<br />

by one single conformation. To demonstrate this, let’s take a spin i of measured<br />

pseeudocontact shift PCSi. PCSi is the sum of the contribution of the n states of the<br />

Δχ-tensor.<br />

For the state j, an alternative formula of the pseudocontact shift in a given frame f is:<br />

With<br />

(1.8)<br />

(1.9)<br />

(1.10)<br />

Where x, y, z are the Cartesian coordinates of the spin i in the frame f, and r the distance<br />

between the lanthanide and the spin i. Equation (1.8) becomes:<br />

With<br />

(1.11)<br />

(1.12)<br />

The traceless and symmetric matrix D contains the effective Δχ-tensor parameters that are<br />

necessary and sufficient to describe the PCS experienced by the spin i.


22 Chapter 1. Introduction.<br />

(ii) It could displace the position of the paramagnetic center with respect to the protein<br />

frame. The amplitude of those movements depends on the size and rigidity of the tag<br />

used. Small displacements have mostly small impact on the value of the PCS<br />

because PCSs are usually observable only more than ten Angstroms from the<br />

lanthanide. To assess this principle further, the comparison between a static metal<br />

ion and a mobile one following a realistic trajectory is illustrated in Figure 1.16.<br />

Figure 1.16 Effect of the mobility of the tag on the PCS. (a) Twelve tensors are being used to<br />

represent a realistic trajectory of the tag. They have random orientation, random axial and<br />

rhombic values, and are located three Angstroms away from the anchor point (black dot). The<br />

range of angles covers 110 degrees, in steps of 10 degrees. (b) The isosurfaces resulting from the<br />

ensemble of tensors in (a). The red surface represents the isolevel of 5.0 ppm, the blue one<br />

corresponds to -5.0 ppm. The shapes are distorted compared to the typical shape of an isosurface<br />

shown in Figure 1.12. (c) PCSs cannot be measured closer than 10 Angstroms from the<br />

paramagnetic center. The cutoff area is shown in grey. (d) Surfaces of isolevel 1.0 ppm and -1.0<br />

ppm are shown superimposed to (b). They exhibit a classical profile as seen in Figure 1.12. (e)<br />

The cutoff of 10 Angstroms is superimposed on figure (d).


1.4 Scope of the thesis. 23<br />

Once the PCSs have been measured, the next step is to use them appropriately in order to<br />

extract structural information. Using the PCSs to filter the correct structure(s) (by comparing<br />

calculated and experimental values) among a large ensemble of generated structures would not be<br />

enough. The PCS-based restraint needs to be directly incorporated into the process of structure<br />

generation to bias the outputted conformations towards the native one. Several options are to be<br />

considered: incorporating a PCS-restraint into a molecular dynamics software, a molecular<br />

refinement package that employs simulated annealing routines, or into a molecular fragment<br />

replacement software. The main question is which one of those approaches is the most suited to<br />

capture the global nature of the PCS effect.<br />

The merit of the PCS for de novo structure determination in the context of a molecular<br />

fragment replacement is described in Chapter 4. A PCS score function has been added to the<br />

package CS-ROSETTA (Shen et al., 2008). The ability of the PCS to drive CS-ROSETTA<br />

calculations towards the native conformation and to identify native like structures is discussed.<br />

1.4 Scope of the thesis<br />

This thesis covers different aspects of paramagnetic <strong>NMR</strong> from a computational point of<br />

view. This includes the use of PCSs for <strong>NMR</strong> resonance assignment, for Δχ-tensor determination in<br />

preparation of rigid body complex calculations, and de novo structure determinations of proteins.<br />

The rest of the thesis is organized as follow:<br />

In Chapter 2 is described an experimental and a computational approach to assign chemical<br />

shifts of methyl groups from the paramagnetic and diamagnetic <strong>NMR</strong> spectra. The computational<br />

route is supported by the development of the software Possum which was tested on artificial data<br />

first before being applied to experimental data.<br />

In Chapter 3 is presented a newly developed software that works specifically with<br />

pseudocontact shifts. The possibilities offered by the software are discussed and illustrated by the<br />

rapid reconstruction of the complex between the subunit ε and θ of the DNA polymerase III.<br />

In Chapter 4 is reported the incorporation of the PCS into the molecular fragment<br />

replacement software CS-ROSETTA, and the development of a new protocol to perform, for the<br />

first time, de novo protein structure determination using only PCSs and chemical shifts as<br />

experimental restraints.


24 Chapter 1. Introduction.<br />

Chapters 5 concludes this thesis by presenting some perspective of further development to<br />

better exploit PCS information in structural biology.<br />

1.5 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

angstrom from the metal ion. J Am Chem Soc 122:4154-4161<br />

Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />

HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />

dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />

29:68-76<br />

Bertini I, Felli IC and Luchinat C (1998) High magnetic field consequences on the <strong>NMR</strong> hyperfine<br />

shifts in solution. J Magn Reson 134:360-364<br />

Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />

London.<br />

Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />

the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />

structural information. J Magn Reson 133:216-221<br />

Donaldson LW, Skrynnikov NR, Choy WY, Muhandiram DR, Sarkar B, Forman-Kay JD and Kay<br />

LE (2001) Structural characterization of proteins with an attached ATCUN motif by<br />

paramagnetic relaxation enhancement <strong>NMR</strong> spectroscopy. J Am Chem Soc 123:9843-9847<br />

John M, Headlam MJ, Dixon NE and Otting G (2007) Assignment of paramagnetic 15 N-HSQC<br />

spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />

Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />

Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />

Karplus M (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys<br />

30:11-15<br />

Kobe B, Guss M and Huber T (2008). Structural Proteomics: High Throughput Methods. Humana<br />

Press, Totowa, NJ, USA.


1.5 References. 25<br />

Kuhn HW (1955) The Hungarian Method for the assignment problem. Naval Res Logistics Quart<br />

2:83-97<br />

McCoy MA and Wyss DF (2002) Structures of protein-protein complexes are docked using only<br />

<strong>NMR</strong> restraints from residual dipolar coupling and chemical shift perturbations. J Am<br />

Chem Soc 124:2104-2105<br />

Nguyen BD, Xia ZC, Yeh DC, Vyas K, Deaguero H and La Mar GN (1999) Solution <strong>NMR</strong><br />

determination of the anisotropy and orientation of the paramagnetic susceptibility tensor as<br />

a function of temperature for metmyoglobin cyanide: Implications for the population of<br />

excited electronic states. J Am Chem Soc 121:208-217<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />

Soc 128:3696-3702<br />

Rohl CA and Baker D (2002) De novo determination of protein backbone structure from residual<br />

dipolar couplings using rosetta. J Am Chem Soc 124:2723-2729<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />

structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />

Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />

structure determination package. J Magn Reson 160:65-73<br />

Shen Y and Bax A (2007) Protein backbone chemical shifts predicted from searching a database for<br />

torsion angle and sequence homology. J Biomol <strong>NMR</strong> 38:289-302<br />

Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />

Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />

A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc<br />

Natl Acad Sci U S A 105:4685-4690<br />

Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />

from fragments with similar local sequences using simulated annealing and bayesian<br />

scoring functions. J Mol Biol 268:209-225<br />

Smith DJ, Maggio ET and Kenyon GL (1975) Simple alkanethiol groups for temporary blocking of<br />

sulfhydryl groups of enzymes. Biochemistry 14:766-71


26 Chapter 1. Introduction.<br />

Su XC, Liang HB, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- is a convenient paramagnetic<br />

shift reagent for protein <strong>NMR</strong> studies. J Am Chem Soc 131:10352-10353<br />

Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />

<strong>NMR</strong> in press


Chapter 2<br />

Possum: paramagnetically<br />

orchestrated spectral solver of<br />

unassigned methyls<br />

2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls


28 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

2.1 Abstract<br />

Pseudocontact shifts (PCS) induced by a site-specifically bound paramagnetic lanthanide<br />

ion are shown to provide fast access to sequence-specific resonance assignments of methyl groups<br />

in proteins of known three-dimensional structure. Stereospecific assignments of Val and Leu<br />

methyls are obtained as well as the resonance assignments of all other methyls, including Met CH3<br />

groups. No prior assignments of the diamagnetic protein are required, nor are experiments that<br />

transfer magnetization between the methyl groups and the protein backbone. Methyl Cz-exchange<br />

experiments were designed to provide convenient access to PCS measurements in situations where<br />

a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In the absence of<br />

exchange, simultaneous 13 C-HSQC assignments and PCS measurements are delivered by the newly<br />

developed program Possum. The approaches are demonstrated with the complex between the N-<br />

terminal domain of the subunit and the subunit of the Escherichia coli DNA polymerase III.<br />

2.2 Introduction<br />

Methyl groups are excellent probes for the study of proteins by <strong>NMR</strong> spectroscopy due to<br />

their favorable relaxation properties and intense 1 H <strong>NMR</strong> signals. When buried, they report on the<br />

packing of side chains in the protein core and thus provide important restraints for protein fold<br />

determination (Zwahlen et al., 1998). On the protein surface, they can serve as hydrophobic probes<br />

of protein-protein (Janin et al., 1988, Gross et al., 2003) and protein-ligand (Hajduk et al., 2000)<br />

interactions. Methyl groups have also been established as probes of protein dynamics (Nicholson et<br />

al., 1992, Muhandiram et al., 1995, Wand et al., 1996, Liu et al., 2003, Korzhnev et al., 2004,<br />

Tugarinov et al., 2005a, Tugarinov et al., 2005b) which, in contrast to amide protons, are inert with<br />

regard to solvent exchange.<br />

The resonance assignment of methyl groups in 13 C labeled proteins is usually achieved by<br />

magnetization transfers from sequentially assigned backbone resonances (Montelione et al., 1992).<br />

While this approach works well for proteins up to 30 kDa, it is impeded by fast transverse<br />

relaxation for proteins of high molecular weight or for paramagnetic proteins. Recent advances use<br />

tailored isotope labeling schemes (Tugarinov et al., 2003a, Tugarinov et al., 2003b) which are<br />

expensive and not generally applicable to any type of methyl group. In particular, the methyl<br />

groups of methionine residues are hard to assign since any scalar couplings with the CH3 group<br />

are small (Bax et al., 1994).


2.2 Introduction. 29<br />

As a further drawback, experiments that transfer magnetization between methyl groups and<br />

backbone resonances usually do not afford stereospecific discrimination between the prochiral<br />

methyl groups in Val and Leu residues. In this situation, stereospecific assignments require<br />

additional, stereospecifically labeled samples (Neri et al., 1989, Senn et al., 1989, Ostler et al.,<br />

1993, Kainosho et al., 2006) or more complicated <strong>NMR</strong> experiments that often entail cumbersome<br />

data analysis (Zuiderweg et al., 1985, Sattler et al., 1992, Karimi-Nejad et al., 1994, Tugarinov et<br />

al., 2004, Tang et al., 2005).<br />

In the case where the three-dimensional structure of the protein is known prior to the <strong>NMR</strong><br />

studies, it would be attractive to use the structure to facilitate the <strong>NMR</strong> resonance assignments. In<br />

favorable situations, structure-based resonance assignments can be achieved from NOE data<br />

(Grishaev et al., 2002). In addition, structure-based assignments of backbone resonances have been<br />

achieved using residual dipolar couplings (RDCs) measured with different alignment media (Jung<br />

et al., 2004) or using the combined information from pseudocontact shifts (PCS), RDCs,<br />

paramagnetic relaxation enhancements (PREs) and cross-correlated relaxation (CCR) induced by<br />

paramagnetic metal ions (Pintacuda et al., 2007). The structural interpretation of PCS has been used<br />

earlier to support resonance assignments of ligand residues in heme proteins (Senn et al., 1985).<br />

Recent advances in site-specific attachment of single lanthanide ions to proteins (Ma et al., 2000,<br />

Dvoretsky et al., 2002, Wöhnert et al., 2003, Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et<br />

al., 2005, Haberz et al., 2006, Rodriguez-Castañeda et al., 2006, Su et al., 2006) extend this<br />

approach to long-range paramagnetic effects, with the possibility of tuning the range of focus by<br />

choice of a particular lanthanide (Allegrozzi et al., 2000, Balayssac et al., 2006, Pintacuda et al.,<br />

2007).<br />

Here we show that the analysis of PCS induced by lanthanide ions presents a powerful tool<br />

for the assignment of methyl resonances, which by reference to the 3D structure of the protein,<br />

works even in situations when connectivities to the backbone resonances are difficult to establish or<br />

the backbone resonance assignment is incomplete. Stereospecific assignments of Val and Leu<br />

methyls are obtained as well as the assignments of any other methyl resonances, including those of<br />

Met CH3 groups. We present two Cz-EXSY experiments for the convenient measurement of PCS<br />

in situations where a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In<br />

addition, an algorithm was developed to assign the 13 C-HSQC cross-peaks of methyl groups in the<br />

situation where no exchange information is available. The approaches are demonstrated with the 30<br />

kDa complex between the N-terminal exonuclease domain 186 and the subunit of Escherichia


30 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

coli DNA polymerase III. The active site of 186 binds two divalent ions (Hamdan et al., 2002b)<br />

that can be replaced by a single Ln 3+ ion (Pintacuda et al., 2004).<br />

2.3 Experimental section<br />

2.3.1 Sample preparation<br />

A cyclized version of 186, cz- 186, was designed for enhanced stability of the protein in<br />

unrelated crystallographic experiments (Park, 2006). Using an intein-based strategy (Williams et<br />

al., 2002), the N-terminal Ser2 and C-terminal Ala186 of 186 were linked by the nonapeptide<br />

TRESGSIEF (numbered 187-195). Apart from the N- and C-terminal residues that are structurally<br />

disordered in 186 (Hamdan et al., 2002b),the amide proton chemical shifts of the linear protein are<br />

conserved in cz- 186 within ±0.05 ppm, indicating that cyclization does not significantly affect the<br />

protein structure. The proteins cz- 186 and were prepared, and used to isolate samples of the cz-<br />

186/ complex, essentially as described previously (Hamdan et al., 2002a). <strong>NMR</strong> experiments<br />

made use of three different samples of complexes of unlabeled with isotope-labeled cz- 186: (i) a<br />

uniformly 13 C/ 15 N-labeled sample (0.5 mM), (ii) a biosynthetically directed fractional 13 C-labeled<br />

sample prepared from 20% 13 C-glucose (0.5 mM) (Neri et al., 1989, Senn et al., 1989), and (iii) a<br />

sample with 13 C/ 15 N-Leu (0.15 mM). Samples of 186/ were dialyzed against <strong>NMR</strong> buffer (20<br />

mM Tris, pH 7.2, 100 mM NaCl, 0.1 mM dithiothreitol, and 0.08% (w/v) NaN3 in 90% H2O/10%<br />

D2O).<br />

Lanthanides (Ln 3+ = La 3+ or 1:1 mixtures of La 3+ /Dy 3+ or La 3+ /Yb 3+ ) were added from<br />

LnCl3 stock solutions in the same buffer containing total Ln 3+ concentrations of 30 mM. The 1:1<br />

mixtures were added in slight molar excess to catalyze the metal ion exchange, resulting in<br />

exchange rates of a few s –1 (John et al., 2007a, John et al., 2007b).Restoration of the apo-complex<br />

was achieved by extensive dialysis against buffer containing 1 mM EDTA followed by dialysis<br />

against EDTA-free buffer.<br />

2.3.2 <strong>NMR</strong> spectroscopy<br />

All <strong>NMR</strong> experiments were performed at 25 o C on a Bruker AV 800 MHz <strong>NMR</strong><br />

spectrometer equipped with a cryogenic TCI probe. Sequence-specific resonance assignments of


2.3 Experimental section. 31<br />

the methyl groups in the diamagnetic state were established by 3D HNCA and (H)CCH-TOCSY<br />

spectra of the uniformly 13 C/ 15 N labeled sample complexed with 1 equivalent of La 3+ (cz-<br />

186/ /La 3+ ), and by reference to the assignments reported for the linear 186 protein with Mg 2+<br />

(DeRose et al., 2003). Stereospecific assignments of Val and Leu methyl groups were obtained<br />

from a constant-time (28 ms) 13 C-HSQC spectrum recorded of the fractionally 13 C labeled sample.<br />

Where possible, the rotameric states of the side chains of Val and Leu residues in the crystal<br />

structure of 186 (Hamdan et al., 2002b) were confirmed in solution by a 3D NOESY- 15 N-HSQC<br />

spectrum (mixing time 60 ms) recorded of the uniformly 13 C/ 15 N labeled sample.<br />

Sequence-specific resonance assignments of the methyl groups in the paramagnetic state<br />

were established by 2D and 3D methyl Cz-EXSY spectra recorded with the pulse schemes of Figure<br />

2.1 using a mixing period ( m) of 480 ms and spectral widths of 30 ppm ( 13 C) and 16 ppm ( 1 H). The<br />

2D spectra were acquired with 160 × 1024 complex data points and 32 scans in 10 h, while the 3D<br />

spectra were acquired with 80 × 64 × 1024 complex points and 4 scans in 40 h. For all spectra, the<br />

initial t1 delay was set to half the increment so that folded paramagnetic peaks could be identified<br />

by their inverted sign (Bax et al., 1991).<br />

The methyl group assignments obtained with these experiments provided the controls for<br />

the assignment methods described below.<br />

Figure 2.1 Methyl CZ-EXSY experiments. (a, b) Pulse schemes of the 2D and 3D versions,<br />

respectively. Narrow and wide bars represent radiofrequency pulses with flip angles of 90º<br />

and 180º, respectively, applied with phase x unless indicated otherwise. Selective 13 C pulses<br />

were applied as a 1.5 ms Q5 pulse prior to the delay C and as a 1.5 ms time-reversed Q5


32 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

pulse prior to the delay H, generating an excitation bandwidth of 20 ppm. 1 H saturation is<br />

achieved with 120º pulses applied every 5 ms, and the 3-9-19 sequence is used for water<br />

suppression. During m, 180 o ( 1 H) pulses are applied every 6 ms with a MLEV-16 supercycle<br />

to suppress cross relaxation between 1 H and 13 C spins. The phase cycle was 1(x, –x), 2(2x,<br />

2(–x)), 3(x), 4(4y, 4(–y)) and rec(x, 2(–x), x, –x, 2x, –x). States-TPPI was applied to 1 and<br />

3 for quadrature detection. Delays: T = 28 ms, = 3 ms, C = 0.75 ms, H = 1.7 ms.<br />

Gradients (Gi) were applied along the z-axis with strengths of 23.2, 14.5, 20.3, 17.5 and 11.6<br />

G/cm. (c-e) Simulated dipolar 13 C relaxation rates and NOE in isolated CH3 groups versus<br />

molecular rotational correlation time ( R) using eqs 1-4 in ref. 5. (c) Transverse relaxation<br />

rate R2, (d) longitudinal relaxation rate R1, and (e) steady-state 13 C{ 1 H} NOE. The dashed<br />

line reports the relaxation rates calculated for a static CH3 group, whereas the solid line<br />

takes into account a rapid rotation around the three-fold symmetry axis with a correlation<br />

time f = 25 ps and assuming tetrahedral geometry (Sf 2 = 0.111, rCH = 1.10 Å, rCC = 1.52 Å)<br />

(Wand et al., 1996). The dotted line represents the contribution from a neighboring 13 C spin<br />

to R2, R1, and cross-relaxation ( ), respectively. The vertical axis of the 13 C- 13 C cross-<br />

relaxation rate in (e) is in s –1 . Due to the small contribution of PRE to R1 (John et al., 2007a),<br />

the 13 C{ 1 H} NOE is similar for paramagnetic and diamagnetic proteins.<br />

2.3.3 Manual resonance assignments from PCS<br />

The PCS measured from EXSY spectra were used to evaluate the possibility of assigning<br />

the methyl peaks by comparison with back-calculated PCS. PCS were back-calculated using a<br />

Mathematica (Wolfram <strong>Research</strong>) script and the crystal structure of 186 (PDB entry 1J53, ref. 40).<br />

The -tensor parameters of Dy 3+ in complex with 186/ have been reported previously (Schmitz<br />

et al., 2006). The tensor parameters for Yb 3+ were determined from 15 N-HSQC spectra using the<br />

program Echidna (Schmitz et al., 2006) as: ax = –6.52 × 10 –32 m 3 , rh = 1.12 × 10 –32 m 3 , =<br />

24.4º, = 84.5º, and = –299.5º (using the zxz convention of Euler angle rotations). 1 H PCS of<br />

methyl groups were calculated for each of the three methyl protons individually and averaged. This<br />

average is largely insensitive to the rotational position of the methyl group. Residual CSA effects<br />

due to paramagnetic alignment (John et al., 2005) were disregarded since CSA tensors of methyl<br />

groups are small (Liu et al., 2003).<br />

2.3.4 The program Possum


2.3 Experimental section. 33<br />

The program Possum (paramagnetically orchestrated spectral solver of unassigned methyls)<br />

was developed to assign the cross-peaks of methyl groups in correlation spectra recorded with<br />

diamagnetic and paramagnetic metal ions by reference to the 3D structure of the protein and<br />

independently determined tensors. The program requires that the amino-acid type is known (e.g.<br />

by residue-type selective 13 C-labeling). Furthermore, it can accept information about methyl cross-<br />

peaks belonging to the same residue (―methyl connectivity‖ data for Ile, Leu, and Val, as provided<br />

by HCCH-TOCSY experiments) and stereospecific information (―methyl specificity‖ data<br />

distinguishing between 2 and 1 cross-peaks of Ile, 1 and 2 cross-peaks of Leu, and 1 and 2<br />

cross-peaks of Val, as provided by samples produced with biosynthetically directed fractional 13 C-<br />

labeling (Neri et al., 1989, Senn et al., 1989, Tugarinov et al., 2004) or stereoselective isotope<br />

labeling (Ostler et al., 1993, Kainosho et al., 2006)). In the present version of the program, the<br />

methyl connectivity information is always assumed to be available for the diamagnetic state.<br />

The program takes as input the 1 H and 13 C chemical shifts of methyl groups measured in<br />

13 C-HSQC spectra and the 13 C chemical shifts of methyl groups that are too close to the<br />

paramagnetic center to be directly observable in 1 H-detected <strong>NMR</strong> spectra. By comparing the<br />

chemical shifts in the diamagnetic and paramagnetic states, Possum attempts to find the resonance<br />

assignment with the lowest residual cost C(l) defined by:<br />

with:<br />

subject to:<br />

(2.1)<br />

(2.2)


34 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

(2.3)<br />

(2.4)<br />

where PCS S calc(k,l) is the predicted PCS value for the spin S (S = 13 C or 1 H) of the methyl<br />

group in residue k arising from the paramagnetism of the lanthanide l, para δ S exp(j,l) is the chemical<br />

shift of the resonance j in the presence of the lanthanide l for the spin S, and dia δ S exp(i) is the<br />

diamagnetic resonance i for the spin S. The cost function assigns smaller costs for deviations<br />

between calculated and observed PCS when the experimentally observed PCS is large, while the<br />

constant e(l) prevents a singularity in the cost function and accounts for the error in measurements<br />

when a spin experiences small paramagnetic effects far away from the paramagnetic center. e(l)<br />

scales with the magnitude of the tensor of the lanthanide l. Empirically determined values<br />

(e(Yb 3+ ) = 1/6 ppm and e(Dy 3+ ) = 1 ppm) were used here 2 . Equations (2.3) and (2.4) ensure that<br />

each calculated PCS and each experimental chemical shift are chosen exactly once within the<br />

global assignment.<br />

Equations (1.1), (2.3) and (2.4) present the formulation of the three-index assignment<br />

problem (Schell, 1955) which is the three-dimensional instance of the multidimensional assignment<br />

problem (MAP). With D being the number of dimensions of the MAP (D = 3 in the example above)<br />

and n being the size of each of the D sets of data, there are (n!) D-1 possible assignments. When D is<br />

strictly larger than 2, MAP has been proven to be NP-hard (Karp, 1972) and, as a result, no<br />

algorithm can guarantee the best solution to the problem in a polynomial time. An exhaustive<br />

2 The purpose of e(l) is to avoid degenerate cases where the experimental PCS is very close to zero.<br />

Its value is not critical for the success of the algorithm. E(l) has however been optimized to yield<br />

best possible results.


2.3 Experimental section. 35<br />

search through the (n!) 2 possibilities is impracticable for even the smallest problem sizes. An exact<br />

branch and bound algorithm that explores only a part of all possible assignments has been proposed<br />

(Balas et al., 1991) and works well for small problem sizes, especially when there is a good<br />

agreement between predicted and observed PCS. In the present context, a simulated annealing<br />

optimization scheme proved more efficient computationally. The dimensionality D of the<br />

assignment problem generated by Possum depends on the residue type, the availability of<br />

connectivity information, and the number of different lanthanides used. We have performed<br />

calculations with up to 6 dimensions. Examples of 3- and 4-dimensional problems are illustrated in<br />

Figure 2.2.<br />

Figure 2.2 Formulation of the assignment problem depending on the information available. The<br />

columns dia δ S exp and para δ S exp contain the chemical shifts (S = 13 C and S = 1 H as observed for 13 C-<br />

HSQC cross-peaks) measured in the presence of a diamagnetic or paramagnetic lanthanide,<br />

respectively. The column marked PCS S calc contains the 13 C and 1 H PCS calculated from the<br />

tensor and the 3D structure of the protein. (a) Assignment problem for residues with a single<br />

methyl group (Ala, Met, Thr). The indices i and j refer to the cross-peak number in the diamagnetic<br />

and paramagnetic state, respectively, and the index k is the residue number in the amino-acid<br />

sequence, as in equation (1.1). The assignment (i = 1, j = 3, k = 1) is illustrated by connecting<br />

lines. The associated cost can be calculated using equation (2.2). The other n-1 assignments<br />

necessary to calculate the total cost C(l) according to equation (1.1) are not shown. Overall, this<br />

assignment problem is three-dimensional. (b) Simultaneous use of the information from two<br />

samples containing the paramagnetic lanthanides l1 or l2 creates a four-dimensional assignment<br />

problem. (c) For amino acids with two methyl groups (Ile, Leu, Val), the columns dia δ S exp, para δ S exp,<br />

and PCS S calc embed the chemical shifts (and PCS) of two methyl groups (m1 and m2). If the methyl-<br />

specificity information is not available in the paramagnetic state (illustrated by m? in the column


36 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

para δ S exp), Possum will compute the two possible costs and only keep the lower one. (d) For Ile, Leu,<br />

and Val residues, the methyl-methyl connectivity information may be available in the diamagnetic<br />

state but not in the paramagnetic state. This situation creates a four-dimensional assignment<br />

problem for data from a single paramagnetic lanthanide.<br />

The program also takes into account the absence of paramagnetic peaks due to PRE by<br />

preventing the assignment of observable paramagnetic peaks to methyl groups located closer to the<br />

metal ion than a user-specified cutoff. In the present work, cutoffs of 6 and 9 Å were used for the<br />

Yb 3+ and Dy 3+ complexes, respectively. Paramagnetic peaks missing for any other reason (e.g.<br />

spectral overlap) are also tolerated. This is achieved by assigning a cost only to pairings of<br />

observable paramagnetic and diamagnetic peaks, whereas a zero cost is associated with any<br />

unassigned diamagnetic peak left over. Finally, the program allows for the possibility that<br />

paramagnetic shifts may have been observed only for either the 13 C or the 1 H resonance of a methyl<br />

group.<br />

The calculation of the assignment starting from the chemical shifts of Table S2.1 took less<br />

than 2 h on an AMD 64 4200+ processor, when using all available information, including methyl<br />

connectivity and methyl specificity information and the chemical shifts from the Yb 3+ and Dy 3+<br />

complexes.<br />

2.4 Results<br />

2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes<br />

Constant-time 13 C-HSQC spectra of the uniformly 13 C labeled diamagnetic cz- 186/ /La 3+<br />

complex and the paramagnetic cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+ complexes illustrate the spectral<br />

complexity of the methyl region and the effect of the paramagnetism. The spectrum of the cz-<br />

186/ /La 3+ complex (blue peaks in Figure 2.3) contains approximately the number of methyl peaks<br />

expected for 19 Ala, 14 Thr, 6 Met, 12 Val, 17 Leu, and 14 Ile residues (125 methyl groups). The<br />

signals of Met CH3 groups are particularly well resolved and easily identified as they appear with<br />

opposite sign.


2.4 Results. 37<br />

Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex<br />

(containing 13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a)<br />

La 3+ /Dy 3+ and (b) La 3+ /Yb 3+ (red). Met CH3 and CH2 groups appear with inverted sign<br />

(light colors). The spectra were recorded using a constant time of 28 ms and t2max = 160 ms.<br />

The spectra of the mixed samples were acquired with 4 times as many scans to compensate<br />

for the halving of the effective concentrations.<br />

As Dy 3+ is one of the strongest paramagnetic lanthanide ions (Pintacuda et al., 2007), the<br />

methyl peaks of the cz- 186/ /Dy 3+ complex (red peaks in Figure 2.3a) are strongly shifted by PCS<br />

and affected by 1 H line broadening due to transverse paramagnetic relaxation enhancement (PRE).<br />

Thus, only 55 cross-peaks are observable corresponding to methyl groups with a distance from the<br />

Dy 3+ ion larger than 15 Å, many of them with intensities close to the noise level. Part of the 1 H line<br />

broadening is caused by unresolved RDCs, including intra-methyl RDCs (Kaikkonen et al., 2001),<br />

originating from the paramagnetically induced alignment of the protein with the magnetic field.<br />

In the cz- 186/ /Yb 3+ complex, the cutoff distance is reduced to about 9 Å due to the about<br />

6 times smaller paramagnetic moment of Yb 3+ so that only 10 methyl peaks are expected to be<br />

broadened beyond detection. Of the remaining 115 methyl resonances (red peaks in Figure 2.3b),<br />

only 14 peaks could not be analyzed due to overlap or very small PCS at larger distances from the<br />

metal ion. Figure 2.3 shows that for both paramagnetic lanthanides, it is nearly impossible to trace<br />

the paramagnetic shift of a 13 C-HSQC peak using the criterion that the PCS in the 13 C and 1 H<br />

dimensions of the spectrum must be similar. (In methyl groups, the distance between the carbon<br />

and the average position of the three protons is less than 0.4 Å.) Therefore, without prior<br />

knowledge of resonance assignments, PCS measurements cannot be made manually from 13 C-<br />

HSQC spectra alone.


38 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

2.4.2 Methyl CZ-EXSY experiments<br />

In order to measure PCS data conveniently and with high sensitivity, we designed an<br />

experiment applicable to samples prepared with a 1:1 mixture of paramagnetic and diamagnetic<br />

metal ions, where chemical exchange between the metal ions leads to exchange of the protein<br />

between paramagnetic and diamagnetic states. By generating exchange cross-peaks between methyl<br />

peaks of the diamagnetic and paramagnetic lanthanide complexes, the experiment allows the<br />

measurement of 1 H and 13 C PCS from a single spectrum. Figure 2.1 shows 2D and 3D versions of<br />

the methyl Cz-EXSY experiment. The pulse sequences are related to previously published Nz-<br />

exchange experiments (Farrow et al., 1994, John et al., 2007a).<br />

During a mixing period m, magnetization is stored as relatively slowly relaxing CZ<br />

magnetization. Simulations indicate that, owing to rapid rotation around the 3-fold symmetry axis,<br />

longitudinal relaxation rates R1 of methyl 13 C spins are fairly insensitive with respect to molecular<br />

size, and barely exceed 2 s –1 even for very small proteins (Figure 2.1d). In the cz- 186/ /La 3+<br />

complex ( C = 17 ns), we measured R1( 13 C) rates of about 1.6 s –1 for the majority of methyl groups.<br />

Only a group of highly mobile Thr residues relaxed somewhat faster (2 s –1 ), whereas the R1( 13 C)<br />

relaxation in Met CH3 groups was much slower (about 0.7 s –1 ). In contrast to transverse relaxation<br />

rates R2, R1 rates in macromolecules are barely affected by the paramagnetism of lanthanides(John<br />

et al., 2007a).<br />

The experiment yields auto-peaks for the diamagnetic and paramagnetic states (dd and pp<br />

peaks, respectively) and exchange peaks arising from magnetization exchange from the<br />

paramagnetic to the diamagnetic state and vice versa (pd and dp peaks, respectively).<br />

Since the experiment starts from 13 C polarization rather than using an INEPT transfer, pd<br />

peaks can be detected even for methyl groups that are strongly affected by 1 H PRE in the<br />

paramagnetic state and thus invisible in the 13 C-HSQC spectrum. Combined with the dd peaks, this<br />

allows 13 C PCS measurements that are limited only by the (16-fold smaller) 13 C PRE (John et al.,<br />

2007b). As indicated previously 13 and illustrated by the simulations of Figure 2.1e, 13 C polarization<br />

in methyl groups of proteins can be very efficiently enhanced using the { 1 H} 13 C NOE. This holds<br />

irrespective of paramagnetism. We observed an about two-fold increase in 13 C polarization in the<br />

cz- 186/ /La 3+ complex using 1 s of 1 H irradiation between subsequent scans.<br />

For improved resolution in the 13 C dimension and measurement of small 13 C PCS, the 2D<br />

experiment is implemented as a constant-time experiment in the t1 dimension. The 3D experiment<br />

additionally records the 13 C frequency of the protein state after the mixing time. Real-time


2.4 Results. 39<br />

evolution periods in both indirect dimensions yield superior sensitivity for residues with substantial<br />

13 C PRE that commonly also have larger PCS. Selective 13 C pulses select the spectral window of<br />

the methyl 13 C resonances of the diamagnetic complex in order to limit the spectral width required<br />

in the F1 dimension.<br />

2.4.3 Resonance assignment of Met, Ala and Thr methyl groups<br />

Figure 2.4a shows the spectral region of the Met CH3 cross-peaks of the 2D methyl Cz-<br />

exchange spectrum, recorded with a sample of cz- 186/ containing La 3+ and Dy 3+ in a 1:1 ratio.<br />

Out of six Met residues, four are observed in the 13 C-HSQC spectrum of the cz- 186/ /Dy 3+<br />

complex (the cross-peak of Met178 in the paramagnetic state appears with very weak intensity at<br />

2( 1 H) = 5.33 ppm). For these residues, both auto and both exchange peaks become visible in the<br />

exchange spectrum, forming a rectangle that allows straightforward identification of dd-pp peak<br />

pairs, yielding 13 C PCS of 3.28, 1.31, 0.49 and –0.65 ppm. A fifth residue only yields a pd<br />

exchange peak with a 13 C PCS value of –3.39 ppm.<br />

Figure 2.4 Assignment of Met CH3 from PCS. (a) 2D methyl Cz-EXSY spectrum of cz- 186/<br />

(containing 13 C/ 15 N-labeled cz- 186) loaded with a 1:1 mixture of La 3+ and Dy 3+ (red), overlaid<br />

with the 13 C-HSQC spectrum (blue). The diamagnetic auto-peaks (dd) are labeled with the<br />

assignment and connected to the paramagnetic auto-peaks (pp) and the exchange peaks (pd and<br />

dp) with dashed rectangles. The dp and pp peaks of Met178 are outside the selected spectral region


40 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

at 2 = 5.53 ppm. Met107 only shows a pd exchange peak (vertical dashed line), and neither pp-<br />

peak nor exchange peaks are observed for Met18. The spectrum was recorded with a mixing time<br />

of 480 ms. (b) Comparison of predicted (top) and measured (bottom) PCS of Met CH3 groups. 13 C<br />

PCS and 1 H PCS are plotted with filled and open bars, respectively, and sorted according to the<br />

predicted 13 C PCS. The distances rC-Ln are given in Å in the center. 3<br />

The measured PCS can be compared with values predicted from the known structure of<br />

186 (Park, 2006) and the previously determined tensor of Dy 3+ (Figure 2.4b) (Schmitz et al.,<br />

2006). Only five Met residues belong to the structured part of the protein with predicted 13 C PCS of<br />

3.74 (Met178), 1.30 (Met137), –0.56 (Met87), –1.68 (Met18) and –2.87 ppm (Met107). Met185 is<br />

located in the flexible cyclizing loop of cz- 186 and can be immediately assigned to the very<br />

intense and narrow resonance with a 13 C PCS of 0.49 ppm, in agreement with the PCS of 0.53 ppm<br />

observed for the amide proton of this residue. Met18 is the residue closest to the metal ion (rC-Dy =<br />

12.0 Å) and can be assigned to the methyl group that does not show any exchange peak. As Met18<br />

lines the active site this assignment is independently confirmed by its sensitivity to titration with<br />

nucleotides (unpublished results). Met107 is the second closest residue (rC-Dy = 14.2 Å) and<br />

displays a pd but no dp exchange peak; the assignment of all other Met residues follows in a<br />

straightforward manner from the PCS data.<br />

The data show that it is possible to assign a limited number of methyl groups using PCS<br />

only. The situation is more complex for the methyl groups of the other amino acids since with the<br />

exception of Ile CH3 groups, the amino acid type cannot be identified from 13 C-HSQC spectra<br />

alone. This information would have to be provided either by the use of residue-specific labeling or<br />

additional <strong>NMR</strong> experiments (in the cz- 186/ /La 3+ complex, the amino acid type can readily be<br />

identified from a 3D (H)CCH-TOCSY spectrum). In addition, important information is provided by<br />

(i) the relative size of 13 C and 1 H PCS and (ii) whether the paramagnetic 1 H resonance can be<br />

observed (rC-Dy > 15 Å) or only pd exchange peaks (rC-Dy > 10 Å, Figure S2.4 and Figure S2.5).<br />

3 Experimental PCS have an error below 0.1 ppm. Errors in the calculated PCS depend on the<br />

quality of the 3D structures used. Residues 87, 107, 137 and 178 belong to structured part. The<br />

error on their calculated PCS can be considered below 10% of their absolute value.


2.4.4 Assignments of Val, Leu, and Ile methyl groups<br />

2.4 Results. 41<br />

Val, Leu, and Ile are amino acids with two methyl groups that can easily be linked by<br />

correlations observed in TOCSY spectra; combining the PCS data for both methyl groups greatly<br />

facilitates the resonance assignment of these residues. This is illustrated in Figure 2.5 with the cz-<br />

186/ complex containing 1:1 mixtures of Yb 3+ /La 3+ (a, b) and Dy 3+ /La 3+ (c, d), respectively.<br />

Whereas a 2D (H)C(C)H-TOCSY experiment (Figure S2.1, Supporting Information) recorded with<br />

short mixing time (12 ms) strongly favors intra-residual methyl-methyl correlations (Figure 2.5a)<br />

(Eaton et al., 1990), the 2D methyl CZ-EXSY spectrum yields predominantly exchange peaks and<br />

only weak 1- 2 correlations arising from 13 C- 13 C NOE (Fischer et al., 1996). We have also applied<br />

the (H)C(C)H-TOCSY experiment to a sample of the pure cz- 186/ /La 3+ complex containing<br />

selectively 13 C/ 15 N-Leu labeled cz- 186, where all 1- 2 methyl pairs could be identified (Figure<br />

S2.4).


42 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />

stereospecific resonance assignments. (a) Selected spectral region from a 2D (H)C(C)H TOCSY<br />

spectrum (red) of cz- 186/ loaded with a 1:1 mixture of La 3+ and Yb 3+ , showing the methyl cross-<br />

peaks of Val96. The spectrum is overlaid with the 13 C-HSQC spectrum (blue). Intraresidual<br />

correlations between the cross-peaks of the 1CH3 and 2CH3 groups are identified by dotted lines.<br />

The TOCSY spectrum was recorded with 12 ms mixing time. (b) Same spectral region as in (a)<br />

taken from the 2D methyl Cz-EXSY spectrum of the same sample recorded with a mixing time of<br />

480 ms. (c) Selected strips from the 3D methyl CZ-EXSY spectrum of cz- 186/ loaded with a 1:1<br />

mixture of La 3+ and Dy 3+ (right panels) aligned with corresponding spectral regions from the 2D<br />

methyl CZ-EXSY spectrum (left panels). The strips display the methyl group correlations of Val10<br />

and Leu113. The arrows point from the chemical shifts of the diamagnetic auto-peaks (dd) to the<br />

chemical shifts of the exchange peaks (pd), indicating the 13 C PCS. Horizontal lines identify the


2.4 Results. 43<br />

positions of the dd- and pd-peaks in the 1( 13 C) dimension. The line at 26.35 ppm identifies the 13 C-<br />

13 C NOEs with the CH group of Leu113. (d) Assignment of methyl resonances from the<br />

comparison of predicted with experimental PCS. For each Val residue, the distances from the<br />

lanthanide in Å are indicated for both methyl carbons in the center of the plot. 13 C PCS and 1 H<br />

PCS are displayed as filled and open bars, respectively. The residues are sorted according to the<br />

predicted 13 C-PCS and the PCS are plotted in the sequence C 1 /H 1 /C 2 /H 2 .<br />

Figure 2.5c compares the measurement of 13 C PCS for two residues from strips of 2D and<br />

3D methyl CZ-EXSY spectra. The two experiments are complementary, showing better frequency<br />

resolution in the 2D spectrum and generally less cross-peak overlap in the 3D spectrum. The<br />

example of Leu113 shows that one- and two-bond 13 C- 13 C NOE correlations are visible, but<br />

generally of much smaller intensity than the exchange peaks. Through-bond correlations can again<br />

be identified from TOCSY spectra. From the combined use of the 2D and 3D Cz-EXSY spectra, all<br />

13 C-HSQC cross-peaks observable for any of the methyl groups of the cz- 186/ /Dy 3+ and cz-<br />

186/ /Yb 3+ complexes could readily be correlated with the corresponding 13 C-HSQC cross-peaks<br />

of the cz- 186/ /La 3+ complex, yielding the PCS. Compared to the 13 C-HSQC spectrum, the methyl<br />

Cz-EXSY spectra yielded the 13 C chemical shifts in the paramagnetic state for a further 47 methyl<br />

groups of the cz- 186/ /Dy 3+ complex with rC-Dy distances as short as 10 Å, leaving only 11 methyl<br />

groups completely unobservable due to excessive PRE. For the cz- 186/ /Yb 3+ complex, the<br />

methyl Cz-EXSY spectra yielded the 13 C chemical shifts for 7 additional methyl groups with rC-Yb<br />

distances as short as 6 Å, leaving only 1 methyl group unobservable.<br />

Figure 2.5d compares the predicted and measured PCS for Val methyl groups in the cz-<br />

186/ /Dy 3+ complex. Each residue is characterized by up to 4 PCS values, resulting in the<br />

straightforward assignment of 10 out of 12 residues. Only Val39 did not yield unambiguous PCS<br />

data due to resonance overlap, and Val65 is too close to the Dy 3+ ion. The assignment of Val65<br />

could be made in the 186/ /Yb 3+ complex, where this residue yields large PCS (Supporting<br />

Information). The methyl cross-peaks of Leu residues can be assigned in an analogous way (see<br />

below).<br />

Importantly, this approach automatically yields the stereospecific assignment of Val and<br />

Leu methyl peaks, as long as different PCS are observed for the two prochiral methyl groups. Since<br />

the methyl carbons in an isopropyl group are separated by 2.5 Å, this is almost always the case


44 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

(Figure 2.5d). A rare exception is Val50, where the predicted 13 C and 1 H PCS of the 1 and 2<br />

methyl groups are indistinguishable in both the Dy 3+ and Yb 3+ complex.<br />

The methyl groups of Ile residues are particularly easy to assign by PCS, since the spectral<br />

ranges of the 13 C-<strong>NMR</strong> signals of 2 and 1 methyl groups are clearly separated, while intra-<br />

residual methyl-methyl connectivity can still be obtained from TOCSY spectra.<br />

2.4.5 Automatic assignments without EXSY data<br />

Cz-EXSY spectra provide an exceptionally simple way of measuring PCS. For situations<br />

where the metal exchange is too slow for exchange spectra and spectral crowding prevents the<br />

straightforward pairing between diamagnetic and paramagnetic 13 C-HSQC peaks (Figure 2.3), we<br />

have devised the program Possum which determines the correct peak pairings, their resonance<br />

assignments, and their PCS, using the 3D structure of the protein and the tensor (that can readily<br />

be obtained from, e.g. , 15 N-HSQC spectra (Pintacuda et al., 2004, Schmitz et al., 2006)).<br />

The performance of the program was initially tested with simulated data, replacing the<br />

experimental paramagnetic shifts of Table S2.1 by shifts back-calculated from the crystal structure<br />

of 186 (Hamdan et al., 2002b) and using the crystal structure, the experimental diamagnetic<br />

chemical shifts, and the tensors of Dy 3+ and Yb 3+ as input. In all calculations, it was assumed<br />

that the residue types of all methyl resonances were known and the methyl connectivity information<br />

of Val, Leu, and Ile residues was available for the diamagnetic state. Except for extreme cases of<br />

spectral overlap, the program yielded 100% correct assignments. In a second step, structural<br />

uncertainties were simulated by randomly displacing the methyl groups, following a Maxwell-<br />

Boltzmann distribution with maxima at 0.35 and 0.7 Å (resulting in maximal atom displacements of<br />

0.75 and 1.5 Å, respectively, always using the same direction of displacement). Even in the case<br />

with the maximum structural noise, using only paramagnetic data from the Yb 3+ complex and<br />

neither methyl specificity nor methyl connectivity information in the paramagnetic state, Possum<br />

yielded >75% correct assignments of the diamagnetic methyl resonances (Table S2.2 and Table<br />

S2.4). The score increased to >90% when paramagnetic data from the Dy 3+ complex, methyl<br />

specificity information in all complexes, and methyl connectivity information in the Yb 3+ complex<br />

(but not the Dy 3+ complex) were included (Table S2.2 and Table S2.3).<br />

The program was subsequently applied to the experimental data of the methyl groups of cz-<br />

186/ loaded with La 3+ , Yb 3+ and Dy 3+ . Table 2.1 summarizes the results. Using both<br />

paramagnetic lanthanides, the assignment is complete and correct for all diamagnetic 13 C-HSQC


2.4 Results. 45<br />

cross-peaks that have observable paramagnetic partners. The only exceptions are swapped<br />

assignments for Met18 and Met107 and Val65 and Val82. The first arises from a side-chain<br />

conformation that is different in solution than in the single crystal and the second from differences<br />

in the predicted and experimental PCS observed for the peptide segment near Val65 (John et al.,<br />

2007b).The assignments of the methyl groups of the Yb 3+ complex are similarly reliable, whereas<br />

the methyl signals of the Dy 3+ complex are harder to assign (in the absence of methyl connectivity<br />

information). Using only data from the Yb 3+ complex and omitting any methyl specificity<br />

information or connectivity information in the paramagnetic state still results in >70% correct<br />

assignments of the diamagnetic methyl resonances (Table S2.2 and Table S2.4) 4 .<br />

Table 2.1 Automatic assignment of methyl groups by the program Possum a<br />

Residue type Occurrence b La c observable Yb d assigned Dy d assigned La e assigned<br />

Met 6 (1) 5 3/5 4/4 3/5<br />

Thr 14 (4) 8 7/7 7/7 7/7<br />

Ala 19 (2) 17 13/13 11/13 14/14<br />

Ile 14 (2) 24 24/24 21/23 24/24<br />

Val 12 (0) 24 20/20 17/20 18/22<br />

Leu 17 (0) 34 34/34 19/25 34/34<br />

a Obtained using the data reported in Table S2.1, the crystal structure of 186 (Hamdan et al.,<br />

2002b) and tensors determined from 15 N-HSQC spectra as described in the experimental<br />

section. The paramagnetic data measured with Yb 3+ and Dy 3+ were combined to derive the<br />

assignments.<br />

b Total number of residues in cz- 186. The number in brackets refers to residues not observed in<br />

the crystal structure; these were excluded from the calculation.<br />

c Number of methyl groups with coordinates reported in the crystal structure for which cross-peaks<br />

were observed in the cz- 186/ /Ln 3+ sample. Their unassigned chemical shifts were available for<br />

the program.<br />

4 It is the responsibility of the user to inspect and optimize the assignment provided by Possum; in<br />

particular to untangle areas of the spectrum with overlapping peaks.


46 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

d Fraction of correct assignments for the paramagnetic cz- 186/ /Yb 3+ and cz- 186/ /Dy 3+<br />

complexes, as indicated. The number in the denominator is the number of methyl groups for which<br />

cross-peaks were observed in the presence of Yb 3+ or Dy 3+ .<br />

e Fraction of correct assignments for the diamagnetic cz- 186/ /La 3+ complex. The number in the<br />

denominator is the number of methyl groups for which cross-peaks were observed in at least one of<br />

the paramagnetic complexes.<br />

2.4.6 PCS and flexibility<br />

Structural differences between the crystal structure of 186 determined under cryogenic<br />

conditions (Hamdan et al., 2002b) and the solution structure of the cz- 186/ complex become<br />

apparent as differences between measured and predicted PCS. In a few cases, the structural<br />

differences interfere with the resonance assignment. Figure 2.6a illustrates the situation for the cz-<br />

186/ /Dy 3+ complex, where the measured PCS of Leu161 are smaller than predicted and would<br />

more closely match the values predicted for Leu131. This can be explained by a small displacement<br />

of the peptide segment comprising residues 151-161 that decreases the PCS of both methyls of<br />

Leu161. Smaller PCS than expected were also observed for the backbone amides of this segment. 45<br />

The correct assignment would be obtained by focusing on the difference in 13 C PCS between both<br />

methyl groups rather than their magnitude (Figure 2.6a) or by using the data of the cz- 186/ /Yb 3+<br />

complex which are less strongly distance dependent in the 11 Å distance range (Figure 2.6b).


2.4 Results. 47<br />

Figure 2.6 Residues showing deviations between predicted and experimental PCS. Comparison of<br />

calculated and experimental PCS of Leu131, Leu95, Leu161, Leu11 and Ile154 in the cz-<br />

186/ /Dy 3+ complex. The data are plotted in the sequence C 1 /H 1 /C 2 /H 2 and C 2 /H 2 /C 1 /H 1 / for<br />

the Leu residues and Ile154, respectively. (b) Same as (a), but for the cz- 186/ /Yb 3+ complex. (c)<br />

Predicted 13 C PCS of the prochiral methyl groups of Val82 in cz- 186/ /Dy 3+ versus sidechain<br />

dihedral angle. The values predicted from the crystal structure of 186 40 are marked. (d) Same as<br />

(c), but for the CH3 groups of Leu95.<br />

In the cases of Val82 and Leu95 in the Dy 3+ complex, the comparison of experimental and<br />

predicted 13 C-PCS data yields the wrong stereospecific assignment. The 1 and 2 angles of these<br />

residues are –47º and 172º, respectively, in the crystal structure (Hamdan et al., 2002b). Adjusting<br />

these angles to –60º and 180º, respectively, inverts the relative size of the 13 C PCS predicted for the<br />

two methyl groups, leads to much better agreement between predicted and experimental PCS, and<br />

results in the correct stereospecific assignments (Figure 2.6c and d). This observation is most<br />

simply explained by a small difference between the crystal and solution structures. Note that the


48 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

correct assignment could have been obtained for Leu95 in the Yb 3+ complex (Figure 2.6b). None of<br />

the other Val and Leu residues swapped their stereospecific assignment when we changed their 1<br />

angles ( 2 angles in the case of Leu) by ±10º.<br />

In the case of Leu11, different <strong>NMR</strong> criteria suggest that its side chain undergoes dynamic<br />

conformational averaging. (i) The 13 C-PCS values predicted from the structure are -7.0 and -11.9<br />

ppm, whereas the experimental value found for both methyl groups is -8.4 ppm. (ii) In the crystal<br />

structure, the two methyl groups are 9.2 and 11.1 Å from the metal ion, but the 13 C-<strong>NMR</strong> line<br />

widths observed for the methyl groups in the cz- 186/ /Dy 3+ complex are indistinguishable. (iii)<br />

Both methyl resonances overlap with each other in the 13 C-HSQC spectrum, indicating similar<br />

chemical environments, and their line shapes are narrower than those of most other methyl groups.<br />

Remarkably, however, Leu11 is located in the hydrophobic core of the protein and is very well<br />

defined in the crystal structure, 40 although the side chain forms no steric contacts and could access<br />

different rotameric states without introducing van der Waals violations with neighboring atoms.<br />

Conceivably, the low temperature used in the X-ray experiment may have frozen out a single<br />

conformation, whereas a much larger conformational space is accessible at room temperature.<br />

Ile154 presents an example where partial motional averaging may be indicated by a smaller<br />

difference observed between the PCS of the 1 and 2 carbon atoms than predicted. The side chain<br />

heavy atoms of this residue shows enhanced B-factors in the crystal structure, in agreement with its<br />

location at the protein surface.<br />

2.5 Discussion<br />

The present work shows that methyl resonances of 13 C-labeled proteins can be assigned<br />

solely from PCS with reference to the 3D structure of the protein, yielding both sequence- and<br />

stereo-specific resonance assignments without having to establish connectivities to backbone<br />

resonances. This presents a significant advance over our previous strategy for the assignment of<br />

15 N-HSQC spectra, which relied on PCS, PRE, CCR, and RDCs measured on selectively labeled<br />

samples (Pintacuda et al., 2004).<br />

Clearly, any resonance assignment based on comparison of experimental and back-<br />

calculated PCS critically depends on the accuracy of the 3D structure of the protein and is expected<br />

to fail for flexible protein segments. Yet, this problem is much less severe than in the case of RDCs


2.5 Discussion. 49<br />

(Sibille et al., 2002), since PCS are far less affected by local mobility as long as the spins are not<br />

very close to the paramagnetic center. The robustness of PCS with regard to structural variations is<br />

particularly beneficial for the assignment of Met CH3 groups that are notoriously difficult to<br />

assign by conventional methods. The potential of PCS for their assignment has been noted<br />

previously (Bose-Basu et al., 2004).<br />

The assignment strategy presented here requires the determination of the tensor, which<br />

can readily be achieved from 15 N- 1 H correlation spectra by the Platypus algorithm (Pintacuda et al.,<br />

2004). Obtaining resonance assignments of methyl groups in this way is attractive because 15 N- 1 H<br />

correlation spectra of backbone amides and 13 C- 1 H correlation spectra of methyl groups can be<br />

recorded even for high-molecular weight systems (Fiaux et al., 2002, Sprangers et al., 2007).<br />

Alternatively, the -tensor parameters can be determined from assigned diamagnetic <strong>NMR</strong><br />

resonances and a set of PCS identified by comparison with the paramagnetic <strong>NMR</strong> spectrum, either<br />

manually or automatically using the Echidna algorithm (Schmitz et al., 2006). Initial sequence-<br />

specific resonance assignments can, if necessary, be achieved by site-directed mutagenesis (Siivari<br />

et al., 1995, Bose-Basu et al., 2004), for example by mutation of Ile to Val (Wu et al., 2007).<br />

Assignments by PCS are not limited to metal-binding proteins as different techniques have<br />

recently become available that achieve site-specific attachment of lanthanide-tags to proteins<br />

devoid of natural metal binding sites (Ma et al., 2000, Dvoretsky et al., 2002, Wöhnert et al., 2003,<br />

Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et al., 2005, Haberz et al., 2006, Rodriguez-<br />

Castañeda et al., 2006, Su et al., 2006). The use of different tags or attachment at different sites<br />

readily generates very different tensors (Rodriguez-Castañeda et al., 2006) that can highlight<br />

inconsistencies between experimental and back-calculated PCS.<br />

If the exchange between paramagnetic and diamagnetic metal ions is too slow to measure<br />

exchange spectra, the program Possum can be used to assign the methyl groups in the diamagnetic<br />

and paramagnetic state. As expected, the robustness of Possum with regard to small differences<br />

between the atomic coordinates of the protein and its actual structure in solution increases with the<br />

amount of additional data available. In this respect, data from two paramagnetic metal ions are<br />

particularly beneficial, but also information about intraresidual methyl-methyl connectivities or<br />

stereospecific identities of methyl groups in Val, Leu, and Ile residues.<br />

The robustness of assignments made by Possum can further be enhanced by the increased<br />

spectral resolution afforded by 3D <strong>NMR</strong> spectra which would greatly facilitate the identification of<br />

the corresponding <strong>NMR</strong> resonances in the diamagnetic and paramagnetic state based on the


50 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

criterion that all correlated spins are close in space and therefore experience similar PCS. For<br />

example, 3D (H)CCH-TOCSY or NOESY- 13 C-HSQC spectra would resolve several cross-peaks<br />

for each methyl group, which can simultaneously be compared with the 3D structure of the protein<br />

and the predicted PCS to obtain resonance assignments. For methyl groups in the vicinity of the<br />

paramagnetic ion, the observation of correlations can be aided by protonless experiments (Bermel<br />

et al., 2006).<br />

Conceivably, assignments by PCS can also be achieved for perdeuterated proteins of<br />

increased molecular weight containing selectively protonated methyl groups (Rosen et al., 1996).<br />

The best spectral resolution in the methyl region of the 13 C-HSQC spectrum would be obtained for<br />

CD2H groups (Kainosho et al., 2006).Notably, however, the Cz-EXSY experiments described here<br />

allowed us to measure all PCS data in the uniformly 13 C/ 15 N-labeled and fully protonated sample,<br />

i.e. the improved spectral resolution of selectively labeled samples was not necessary for our<br />

system.<br />

In conclusion, resonance assignments of the 13 C-HSQC cross-peaks of methyl groups by<br />

PCS induced by a site-specifically attached lanthanide ion present a versatile and convenient<br />

technique which can open many opportunities for <strong>NMR</strong> studies of proteins of known three-<br />

dimensional structure. It is anticipated that resonance assignments by this technique will be<br />

particularly useful in ligand screening applications.<br />

2.6 Acknowledgement<br />

The authors thank Don A. Grundel for source codes of the MAP solver and for useful<br />

discussions. M.J. thanks the Humboldt Foundation for a Feodor-Lynen Fellowship. Financial<br />

support from the Australian <strong>Research</strong> Council for project grants, a Federation Fellowship for G.O.<br />

and the 800 MHz <strong>NMR</strong> spectrometer at the ANU is gratefully acknowledged. This work was<br />

supported by an award under the Merit Allocation Scheme of the National Facility of the Australian<br />

Partnership for Advanced Computing.<br />

2.7 Supporting Information Available


2.8 References. 51<br />

Pulse scheme of a (H)C(C)H-TOCSY experiment for correlations between isopropyl methyl<br />

groups, 13 C-HSQC spectra of uniformly, fractionally, and selectively isotope labeled cz- 186/ ,<br />

diagrams comparing experimental and predicted PCS, a table with the chemical shifts of the methyl<br />

groups cz- 186 observed in the presence of La 3+ , Yb 3+ , or Dy 3+ , and tables reporting the number of<br />

methyl groups assigned by Possum. This material is available free of charge via the Internet at<br />

http://pubs.acs.org.<br />

2.8 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

Å from the metal ion. J Am Chem Soc 122:4154-4161<br />

Balas E and Saltzman MJ (1991) An algorithm for the 3-index assignment problem. Oper Res<br />

39:150-161<br />

Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />

complete aminoacid identification in thulium substituted Calbindin D9K. J Biomol <strong>NMR</strong><br />

34:63-73<br />

Bax A, Delaglio F, Grzesiek S and Vuister GW (1994) Resonance assignment of methionine<br />

methyl groups and χ 3 angular information from long-range proton-carbon and carbon-<br />

carbon J correlation in a calmodulin-peptide complex. J Biomol <strong>NMR</strong> 4:787-797<br />

Bax A, Ikura M, Kay LE and Zhu G (1991) Removal of F1 baseline distortion and optimization of<br />

folding in multidimensional <strong>NMR</strong> spectra. J Magn Reson 91:174-178<br />

Bermel W, Bertini I, Felli IC, Piccioli M and Pierattelli R (2006) 13 C-detected protonless <strong>NMR</strong><br />

spectroscopy of proteins in solution. Prog <strong>NMR</strong> Spectrosc 48:25-45<br />

Bose-Basu B, DeRose EF, Kirby TW, Mueller GA, Beard WA, Wilson SH and London RE (2004)<br />

Dynamic characterization of a DNA repair enzyme: <strong>NMR</strong> studies of [methyl-<br />

13 C]methionine-labeled DNA polymerase β. Biochemistry 43:8911-8922<br />

DeRose EF, Darden T, Harvey S, Gabel S, Perrino FW, Schaaper RM and London RE (2003)<br />

Elucidation of the ε- θ subunit interface of Escherichia coli DNA polymerase III by <strong>NMR</strong><br />

spectroscopy. Biochemistry 42:3635-3644<br />

Dvoretsky A, Gaponenko V and Rosevear PR (2002) Derivation of structural restraints using a<br />

thiol-reactive chelator. FEBS Lett 528:189-192


52 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Eaton HL, Fesik SW, Glaser SJ and Drobny GP (1990) Time dependence of 13 C- 13 C magnetization<br />

transfer in isotropic mixing experiments involving amino acid spin systems. J Magn Reson<br />

90:452-463<br />

Farrow NA, Zhang O, Forman-Kay JD and Kay LE (1994) A heteronuclear correlation experiment<br />

for simultaneous determination of 15 N longitudinal decay and chemical exchange rates of<br />

systems in slow equilibrium. J Biomol <strong>NMR</strong> 4:727-734<br />

Fiaux J, Bertelsen EB, Horwich AL and Wüthrich K (2002) <strong>NMR</strong> analysis of a 900K GroEL-<br />

GroES complex. Nature 418:207-211<br />

Fischer MWF, Zeng L and Zuiderweg ERP (1996) Use of 13 C- 13 C NOE for the assignment of <strong>NMR</strong><br />

lines of larger labeled proteins at larger magnetic fields. J Am Chem Soc 118:12457-12458<br />

Grishaev A and Llinás M (2002) CLOUDS, a protocol for deriving a molecular proton density via<br />

<strong>NMR</strong>. Proc Natl Acad Sci U S A 99:6707-6712<br />

Gross JD, Gelev VM and Wagner G (2003) A sensitive and robust method for obtaining<br />

intermolecular NOEs between side chains in large protein complexes. J Biomol <strong>NMR</strong><br />

25:235-242<br />

Haberz P, Rodriguez-Castañeda F, Junker J, Becker S, Leonov A and Griesinger C (2006) Two<br />

new chiral EDTA-based metal chelates for weak alignment of proteins in solution. Org Lett<br />

8:1275-1278<br />

Hajduk PJ, Augeri DJ, Mack J, Mendoza R, Yang J, Betz SF and Fesik SW (2000) <strong>NMR</strong>-based<br />

screening of proteins containing 13 C-labeled methyl groups. J Am Chem Soc 122:7898-<br />

7904<br />

Hamdan S, Bulloch EM, Thompson PR, Beck JL, Yang JY, Crowther JA, Lilley PE, Carr PD, Ollis<br />

DL, Brown SE and Dixon NE (2002a) Hydrolysis of the 5 '-p-nitrophenyl ester of TMP by<br />

the proofreading exonuclease (ε) subunit of Escherichia coli DNA polymerase III.<br />

Biochemistry 41:5266-5275<br />

Hamdan S, Carr PD, Brown SE, Ollis DL and Dixon NE (2002b) Structural basis for proofreading<br />

during replication of the Escherichia coli chromosome. Structure 10:535-546<br />

Ikegami T, Verdier L, Sakhaii P, Grimme S, Pescatore B, Saxena K, Fiebig KM and Griesinger C<br />

(2004) Novel techniques for weak alignment of proteins in solution using chemical tags<br />

coordinating lanthanide ions. J Biomol <strong>NMR</strong> 29:339-349<br />

Janin J, Miller S and Chothia C (1988) Surface, subunit interfaces and interior of oligomeric<br />

proteins. J Mol Biol 204:155-164<br />

John M, Headlam MJ, Dixon NE and Otting G (2007a) Assignment of paramagnetic 15 N-HSQC<br />

spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51


2.8 References. 53<br />

John M, Park AY, Dixon NE and Otting G (2007b) <strong>NMR</strong> detection of protein 15 N spins near<br />

paramagnetic lanthanide ions. J Am Chem Soc 129:462-463<br />

John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />

proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />

shifts. J Am Chem Soc 127:17190-17191<br />

Jung YS and Zweckstetter M (2004) Backbone assignment of proteins with known structure using<br />

residual dipolar couplings. J Biomol <strong>NMR</strong> 30:25-35<br />

Kaikkonen A and Otting G (2001) Residual dipolar 1 H- 1 H couplings of methyl groups in weakly<br />

aligned proteins. J Am Chem Soc 123:1770-1771<br />

Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Ono AM and Güntert P (2006) Optimal isotope<br />

labelling for <strong>NMR</strong> protein structure determinations. Nature 440:52-57<br />

Karimi-Nejad Y, Schmidt JM, Rüterjans H, Schwalbe H and Griesinger C (1994) Conformations of<br />

valine side chains in ribonuclease T1 determined by <strong>NMR</strong> studies of homonuclear and<br />

heteronuclear 3 J coupling constants. Biochemistry 33:5481-5492<br />

Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />

Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />

Korzhnev DM, Kloiber K, Kanelis V, Tugarinov V and Kay LE (2004) Probing slow dynamics in<br />

high molecular weight proteins by methyl-TROSY <strong>NMR</strong> spectroscopy: Application to a<br />

723-residue enzyme. J Am Chem Soc 126:3964-3973<br />

Leonov A, Voigt B, Rodriguez-Castañeda F, Sakhaii P and Griesinger C (2005) Convenient<br />

synthesis of multifunctional EDTA-based chiral metal chelates substituted with an S-<br />

mesylcysteine. Chem Eur J 11:3342-3348<br />

Liu W, Zheng Y, Cistola DP and Yang D (2003) Measurement of methyl 13 C- 1 H cross-correlation<br />

in uniformly 13 C-, 15 N-, labeled proteins. J Biomol <strong>NMR</strong> 27:351-364<br />

Ma C and Opella SJ (2000) Lanthanide ions bind specifically to an added "EF-hand" and orient a<br />

membrane protein in micelles for solution <strong>NMR</strong> spectroscopy. J Magn Reson 146:381-384<br />

Montelione GT, Lyons BA, Emerson SD and Tashiro M (1992) An efficient triple resonance<br />

experiment using carbon-13 isotropic mixing for determining sequence-specific resonance<br />

assignments of isotopically-enriched proteins. J Am Chem Soc 114:10974-10975<br />

Muhandiram DR, Yamazaki T, Sykes BD and Kay LE (1995) Measurement of 2 H T1 and T1ρ<br />

relaxation times in uniformly 13 C-labeled and fractionally 2 H-labeled proteins in solution. J<br />

Am Chem Soc 117:11536-11544<br />

Neri D, Szyperski T, Otting G, Senn H and Wüthrich K (1989) Stereospecific nuclear magnetic<br />

resonance assignments of the methyl groups of valine and leucine in the DNA-binding


54 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

domain of the 434 repressor by biosynthetically directed fractional 13 C labeling.<br />

Biochemistry 28:7510-7516<br />

Nicholson LK, Kay LE, Baldisseri DM, Arango J, Young PE, Bax A and Torchia DA (1992)<br />

Dynamics of methyl groups in proteins as studied by proton-detected 13 C <strong>NMR</strong><br />

spectroscopy. Application to the leucine residues of staphylococcal nuclease. Biochemistry<br />

31:5253-5263<br />

Ostler G, Soteriou A, Moody CM, Khan JA, Birdsall B, Carr MD, Young DW and Feeney J (1993)<br />

Stereospecific assignments of the leucine methyl resonances in the 1 H <strong>NMR</strong> spectrum of<br />

Lactobacillus casei dihydrofolate reductase. FEBS Lett 318:177-180<br />

Park AY (2006) Ph.D. <strong>Thesis</strong>. Australian National University, Australia.<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc Chem Res 40:206-212<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Prudêncio M, Rohovec J, Peters JA, Tocheva E, Boulanger MJ, Murphy MEP, Hupkes HJ, Kosters<br />

W, Impagliazzo A and Ubbink M (2004) A caged lanthanide complex as a paramagnetic<br />

shift agent for protein <strong>NMR</strong>. Chem Eur J 10:3252-3260<br />

Rodriguez-Castañeda F, Haberz P, Leonov A and Griesinger C (2006) Paramagnetic tagging of<br />

diamagnetic proteins for solution <strong>NMR</strong>. Magn Reson Chem 44:S10-S16<br />

Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T and Kay LE (1996) Selective methyl<br />

group protonation of perdeuterated proteins. J Mol Biol 263:627-636<br />

Sattler M, Schwalbe H and Griesinger C (1992) Stereospecific assignment of leucine methyl groups<br />

with 13 C in natural abundance or with random 13 C labeling. J Am Chem Soc 114:1126-1127<br />

Schell E (1955) Distribution of a product over several properties. E 2nd Sym. Linear Program 615–<br />

642<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Senn H, Werner B, Messerle BA, Weber C, Traber R and Wüthrich K (1989) Stereospecific<br />

assignment of the methyl 1 H <strong>NMR</strong> lines of valine and leucine in polypeptides by<br />

nonrandom 13 C labelling. FEBS Lett 249:113-118


2.8 References. 55<br />

Senn H and Wüthrich K (1985) Amino-acid-sequence, hem-hiron coordination geometry and<br />

functional-properties of mitochondrial and bacterial c-type cytochromes. Quart Rev<br />

Biophys 18:111-134<br />

Sibille N, Bersch B, Covès J, Blackledge M and Brutscher B (2002) Side chain orientation from<br />

methyl 1 H- 1 H residual dipolar couplings measured in highly deuterated proteins. J Am<br />

Chem Soc 124:14616-14625<br />

Siivari K, Zhang M, Palmer AG and Vogel HJ (1995) <strong>NMR</strong> studies of the methionine methyl<br />

groups in calmodulin. FEBS Lett 366:104-108<br />

Sprangers R and Kay LE (2007) Quantitative dynamics and binding studies of the 20S proteasome<br />

by <strong>NMR</strong>. Nature 445:618-622<br />

Su XC, Huber T, Dixon NE and Otting G (2006) Site-specific labelling of proteins with a rigid<br />

lanthanide-binding tag. Chembiochem 7:1599-1604<br />

Tang C, Iwahara J and Clore GM (2005) Accurate determination of leucine and valine side-chain<br />

conformations using U-[ 15 N/ 13 C/ 2 H]/[ 1 H-(methine/methyl)-Leu/Val] isotope labeling, NOE<br />

pattern recognition, and methine Cγ-Hγ /Cβ-Hβ residual dipolar couplings: application to<br />

the 34-kDa enzyme IIA Chitobiose . J Biomol <strong>NMR</strong> 33:105-121<br />

Tugarinov V and Kay LE (2003a) Ile, Leu, and Val methyl assignments of the 723-residue malate<br />

synthase G using a new labeling strategy and novel <strong>NMR</strong> methods. J Am Chem Soc<br />

125:13868-13878<br />

Tugarinov V and Kay LE (2003b) Side chain assignments of Ile δ1 methyl groups in high<br />

molecular weight proteins: An application to a 46 ns tumbling molecule. J Am Chem Soc<br />

125:5701-5706<br />

Tugarinov V and Kay LE (2004) Stereospecific <strong>NMR</strong> assignments of prochiral methyls, rotameric<br />

states and dynamics of valine residues in malate synthase G. J Am Chem Soc 126:9827-<br />

9836<br />

Tugarinov V and Kay LE (2005a) Methyl groups as probes of structure and dynamics in <strong>NMR</strong><br />

studies of high-molecular-weight proteins. Chembiochem 6:1567-+<br />

Tugarinov V, Ollerenshaw JE and Kay LE (2005b) Probing side-chain dynamics in high molecular<br />

weight proteins by deuterium <strong>NMR</strong> spin relaxation: An application to an 82-kDa enzyme. J<br />

Am Chem Soc 127:8214-8225<br />

Wand AJ, Urbauer JL, McEvoy RP and Bieber RJ (1996) Internal dynamics of human ubiquitin<br />

revealed by 13 C-relaxation studies of randomly fractionally labeled protein. Biochemistry<br />

35:6116-6125


56 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Williams NK, Prosselkov P, Liepinsh E, Line I, Sharipo A, Littler DR, Curmi PMG, Otting G and<br />

Dixon NE (2002) In vivo protein cyclization promoted by a circularly permuted<br />

Synechocystis sp. PCC6803 DnaB mini-intein. J Biol Chem 277:7790-7798<br />

Wöhnert J, Franz KJ, Nitz M, Imperiali B and Schwalbe H (2003) Protein alignment by a<br />

coexpressed lanthanide-binding tag for the measurement of residual dipolar couplings. J Am<br />

Chem Soc 125:13338-13339<br />

Wu PSC, Ozawa K, Lim SP, Vasudevan SG, Dixon NE and Otting G (2007) Cell-free<br />

transcription/translation from PCR-amplified DNA for high-throughput <strong>NMR</strong> studies.<br />

Angew Chem, Int Ed 46:3356-3358<br />

Zuiderweg ERP, Boelens R and Kaptein R (1985) Stereospecific assignments of 1 H-<strong>NMR</strong> methyl<br />

lines and conformation of valyl residues in the lac repressor headpiece. Biopolymers<br />

24:601-611<br />

Zwahlen C, Gardner KH, Sarma SP, Horita DA, Byrd RA and Kay LE (1998) An <strong>NMR</strong> experiment<br />

for measuring methyl-methyl NOEs in 13 C-labeled proteins with high resolution. J Am<br />

Chem Soc 120:7617-7625<br />

2.9 Supporting information<br />

Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment<br />

used in this study. Parameters are as for the pulse schemes of Figure<br />

2.1. Efficient magnetization transfer between the methyl groups of<br />

isopropyl groups was obtained by applying DIPSI3 mixing for 12 ms<br />

with a radiofrequency amplitude of 8.6 kHz. The Bruker pulse<br />

programs of this pulse sequence and of the pulse sequences of Figure<br />

2.1 can be downloaded from http://rsc.anu.edu.au/~go/.


2.9 Supporting information. 57<br />

Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />

( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C. Only the region containing the methyl cross-peaks is<br />

shown. Cross-peaks from methyl groups of Val, Leu, Ile, Ala and Thr appear as positive peaks<br />

(blue), whereas cross-peaks from Met CH3 and all CH2 groups appear as negative peaks (red).


58 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />

186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled glucose.<br />

Parameters and plot region as in Figure S2.2. Cross-peaks from Val 1, Leu 1, and Ala methyl<br />

groups are positive (blue). Cross-peaks from Val 2, Leu 2, Thr 2 and Met methyl groups are<br />

negative (red). Cross-peaks from Ile 1 and 2 methyl groups are mostly invisible due to<br />

scrambling of 13 C during Ile biosynthesis.


2.9 Supporting information. 59<br />

Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex containing<br />

13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-TOCSY spectrum of the<br />

same sample (red). The assignments of the 13 C-HSQC cross-peaks are indicated. The three mobile<br />

residues Leu11, Leu43 and Leu145 also show one-bond correlations between CH3 and CH<br />

groups.


60 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />

methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile. The distances rC-Ln are<br />

indicated in Å at the top of each plot. For residues with two methyl groups, the distance value<br />

shown at the top refers to the C 1 (Val), C 1 (Leu), or C 1 (Ile) atom.


Figure S2.5 continued<br />

2.9 Supporting information. 61


62 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />

the cz- 186/ /Yb 3+ complex.


Figure S2.6 continued<br />

2.9 Supporting information. 63


64 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />

complexes used in this study a<br />

<strong>Group</strong> rC-Ln cz- 186/ /La 3+ cz- 186/ /Dy 3+ cz- 186/ /Yb 3+<br />

Methionine<br />

(Å)<br />

13 C<br />

1 H<br />

M18 12.0 17.78 2.06 18.04 2.34<br />

M87 22.5 15.97 1.53 15.32 0.93 16.05 1.63<br />

M107 14.2 16.23 2.04 12.84 16.77 2.64<br />

M137 20.7 16.69 2.05 18.00 3.38 16.41 1.78<br />

M178 15.4 15.39 2.22 18.67 5.53 14.69 1.53<br />

M185 16.83 2.07 17.32 2.55 16.75 1.98<br />

Alanine<br />

A4 19.14 1.35<br />

13 C<br />

A23 20.2 17.15 0.62 17.35<br />

A35 11.5 24.14 1.57 17.70 25.47 2.79<br />

A62 10.9 18.72 1.55 30.88 16.97 -0.24<br />

A69 16.2 23.77 1.61 26.74 23.15<br />

A80 22.3 18.68 1.46 19.04 1.84 18.66 1.39<br />

A83 20.1 19.35 1.29 18.97 1.00<br />

A93 19.2 20.74 1.21 19.78 0.22 20.90 1.39<br />

A100 13.3 19.85 1.41 16.19 21.22 1.77<br />

A101 15.1 18.20 1.45 15.48 -1.16 18.58 1.82<br />

A132 17.6 17.89 1.60 17.16<br />

A134 15.8 18.44 1.79 20.16 3.58 17.96 1.30<br />

A147 14.8 18.49 1.28 20.54 18.01 0.86<br />

A150 17.2 17.57 1.42 20.22 3.91 17.12 0.99<br />

A164 5.2 18.93 1.21<br />

A168 7.5 17.24 1.42 18.52<br />

A172 12.5 18.54 1.38 24.10 17.77 0.73<br />

A177 18.5 16.75 0.92 20.11 4.39 16.17 0.34<br />

A186 19.07 1.39 19.45 1.77 19.02 1.32<br />

1 H<br />

13 C<br />

1 H


Threonine<br />

T3 2<br />

T6 2 21.50 1.20 21.53 1.25<br />

T13 2 8.0<br />

T15 2 9.0 19.45 -0.13 34.77 17.36<br />

2.9 Supporting information. 65<br />

T16 2 13.8 23.64 1.28 31.65 22.35 0.04<br />

T44 2 21.0 22.65 1.34 22.64 1.35 22.67 1.39<br />

T78 2 21.1 22.40 1.42 23.44 2.48 22.19 1.21<br />

T121 2 17.4 21.07 0.66 19.34 -1.07 21.31 0.83<br />

T123 2 25.0 21.48 1.09 20.80 0.50 21.66 1.26<br />

T128 2 16.9 21.93 1.10<br />

T160 2 12.9<br />

T179 2 19.6 20.56 0.66 22.15 2.28 20.25 0.35<br />

T183 2 21.44 1.21 22.16 1.91 21.33 1.08<br />

T187 2 21.57 1.19 21.88 1.51 21.50 1.13<br />

Valine<br />

V10 1 10.4 22.52 0.91 28.21 21.70 -0.07<br />

V10 2 12.8 19.56 0.66 22.13 19.19 0.31<br />

V36 1 11.2 21.16 0.75<br />

V36 2 12.3 21.09 0.67 21.97<br />

V38 1 18.5 22.09 0.83 23.20 2.03 21.92 0.67<br />

V38 2 16.2 20.58 0.82 22.08 2.42 20.43 0.64<br />

V39 1 24.1 21.08 0.89 21.14 1.00<br />

V39 2 22.0 22.03 0.95 22.06 1.01<br />

V50 1 14.9 22.32 0.58 19.95 22.70 0.94<br />

V50 2 13.2 20.00 0.68 17.80 20.26 0.97<br />

V58 1 12.7 21.64 1.29 30.67 20.19 -0.24<br />

V58 2 13.5 22.31 1.18 30.02 21.00 -0.16<br />

V65 1 6.4 20.72 0.90 22.51<br />

V65 2 8.8 21.14 0.93 21.74 1.65


66 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

V82 1 17.7 20.81 0.82 20.09 0.08<br />

V82 2 16.6 20.37 0.72 19.79 0.10<br />

V96 1 13.3 19.60 0.23 21.73 19.08 -0.32<br />

V96 2 15.0 19.04 0.29 19.83 1.04 18.87 0.13<br />

V127 1 15.8 21.62 0.78 19.64 -1.32 21.84 1.05<br />

V127 2 17.2 20.59 0.72 19.02 -0.93 20.89 0.95<br />

V133 1 17.9 21.22 0.97 22.43 2.24 20.92 0.67<br />

V133 2 18.4 21.99 1.16 22.69 1.76 21.77 0.95<br />

V174 1 13.3 22.88 0.95 31.40 21.37 -0.49<br />

V174 2 12.5 24.96 0.97 35.55 23.21 -0.83<br />

Leucine<br />

L11 1 11.1 27.14 1.00 18.89 28.30 2.40<br />

L11 2 9.2 26.90 0.99 18.65 28.30 2.39<br />

L43 1 16.1 25.49 0.96 27.78 3.35 25.24 0.69<br />

L43 2 15.6 25.52 0.94 27.63 3.15 25.26 0.72<br />

L52 1 13.6 26.18 0.65 25.88 0.34<br />

L52 2 15.0 24.99 0.56 25.93 24.81 0.39<br />

L57 1 21.4 25.94 0.72 28.07 2.74 25.61 0.39<br />

L57 2 20.4 21.82 0.85 24.23 3.21 21.40 0.44<br />

L73 1 13.8 25.49 0.96 24.36 -0.30<br />

L73 2 13.4 21.30 0.97 25.98 20.34 -0.02<br />

L74 1 22.4 26.03 0.99 27.27 2.17 25.78 0.75<br />

L74 2 21.8 22.44 0.85 23.95 2.39 22.12 0.55<br />

L95 1 15.2 25.21 0.75 22.60 25.64 1.20<br />

L95 2 14.6 23.20 0.78 20.84 -1.73 23.72 1.21<br />

L113 1 20.2 24.42 0.34 25.63 1.59 24.22 0.15<br />

L113 2 22.3 21.37 0.64 22.17 1.45 21.27 0.53<br />

L114 1 21.6 26.00 1.22 26.00 1.24<br />

L114 2 22.0 22.35 1.01 22.33 0.95<br />

L131 1 13.3 22.74 0.79 21.14 22.69 0.70<br />

L131 2 12.5 25.78 0.71 21.89 26.09 1.07


L145 1 8.9 23.96 0.70 16.25<br />

L145 2 6.5 24.01 0.62 14.34<br />

2.9 Supporting information. 67<br />

L148 1 13.2 26.21 0.92 31.96 25.03 -0.32<br />

L148 2 15.4 23.33 1.14 27.21 22.59 0.42<br />

L161 1 11.3 24.89 0.72 23.66 25.39 1.18<br />

L161 2 10.6 21.78 0.83 19.26 22.39 1.42<br />

L165 1 10.1 24.05 0.89 17.02 25.60 2.34<br />

L165 2 11.0 26.53 1.03 19.74 27.83 2.26<br />

L166 1 8.8 22.89 0.74<br />

L166 2 7.7 24.99 0.86 26.66<br />

L171 1 10.1 21.06 0.82 36.23 18.23 -2.00<br />

L171 2 8.9 26.65 0.99 23.97<br />

L176 1 17.6 24.89 0.72 27.48 3.25 24.51 0.36<br />

L176 2 18.5 21.98 0.70 24.08 2.64 21.66 0.40<br />

Isoleucine<br />

I5 2 17.94 0.95 17.86 0.87 17.96 0.97<br />

I5 1 12.76 0.82 12.66 0.71<br />

I9 2 13.9 18.78 0.77 16.02 19.39 1.35<br />

I9 1 16.6 10.64 0.31 9.09 -1.20 11.00 0.64<br />

I21 2 23.8 17.24 0.85 17.29 0.91 17.24 0.87<br />

I21 1 24.3 12.79 0.85 12.86 0.93<br />

I30 2 10.4 17.96 0.82 23.37 17.09 -0.06<br />

I30 1 12.0 13.35 0.64 15.00 13.15 0.55<br />

I31 2 11.9 18.45 0.75 28.85 16.71 -0.92<br />

I31 1 9.8 14.65 0.34 33.74 11.31 -2.95<br />

I33 2 10.6 17.24 0.71 8.89 18.77 2.26<br />

I33 1 12.2 12.24 0.08 9.89 12.65 0.45<br />

I68 2 11.4 19.25 0.79 28.19 17.44 -0.91<br />

I68 1 8.4 13.87 0.56 9.07<br />

I90 2 15.6 18.42 0.41 15.93 -2.28 18.93 0.92<br />

I90 1 16.7 12.90 0.53 10.88 -1.52


68 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

I97 2 9.1 18.43 1.01 8.34 19.69 2.37<br />

I97 1 11.4 14.99 0.88 9.50 15.78 1.68<br />

I104 2 16.0 17.84 0.90 15.86 -2.05 18.13 1.19<br />

I104 1 14.4 10.34 0.71 7.50 10.70 1.14<br />

I118 2 22.6 15.96 0.66 15.47 0.17 16.02 0.75<br />

I118 1 23.0 13.16 0.86 12.89 0.58 13.23 0.91<br />

I154 2 13.7 16.96 0.76 23.43 15.97 -0.23<br />

I154 1 16.3 11.78 0.73 17.48 6.40 10.94 -0.16<br />

I170 2 9.5 18.41 0.71 40.65 15.06 -2.55<br />

I170 1 8.6 12.88 0.68 32.52 10.26 -1.98<br />

I193 2 17.46 0.81 17.55 0.91 17.42 0.78<br />

I193 1 13.04 0.83 13.15 0.94 13.00 0.80<br />

a Conditions: 25 ºC, pH 7.2. The chemical shifts in the cz- 186/ /La 3+ complex were measured<br />

from 13 C-HSQC spectra of the sample containing 13 C/ 15 N labeled cz- 186 in the presence of 1<br />

equivalent La 3+ . Whenever possible, chemical shifts of the cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+<br />

complexes were measured from 13 C-HSQC spectra of samples prepared with 1:1 mixtures of La 3+<br />

and Dy 3+ , or La 3+ and Yb 3+ , respectively. 13 C chemical shifts of methyl groups for which no 1 H<br />

chemical shift is reported were measured from the pd exchange peaks in 2D or 3D methyl Cz-EXSY<br />

spectra, whichever gave better resolution. When neither 13 C nor 1 H chemical shifts are indicated,<br />

the expected cross-peak could not be identified either because of spectral overlap (e.g. in the case<br />

of vanishing PCS) or strong PRE.


2.9 Supporting information. 69<br />

Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />

using the program Possum a<br />

a Calculations were performed using the experimental data of Table S2.1 and simulated data,<br />

where the paramagnetic chemical shifts of Table S2.1 were replaced by chemical shifts back-<br />

calculated from the crystal structure of 186 and the tensors used in the present study. Two<br />

additional sets of simulated data were generated by addition of structural noise to the PDB<br />

coordinates of 186. The structural noise followed a Gaussian distribution of 0.25 and 0.5 Å<br />

standard deviation, resulting in a Maxwell-Boltzmann distribution of atomic displacements with<br />

maxima at 0.35 and 0.7 Å, respectively. The columns marked “Dy max”, “Yb max”, and “La max”<br />

report the number of methyl groups for which data in the paramagnetic state were available to the<br />

program. (Additional peaks observed in the diamagnetic state remained unassigned.) The results<br />

are reported for calculations where the diamagnetic chemical shifts were supplemented only with<br />

data from Dy 3+ (light yellow), Yb 3+ (light blue) or both (grey). The rows marked with the % symbol


70 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

display the percentage of correctly assigned methyl groups for all three residues. The program<br />

Possum is available from http://compbio.chemistry.uq.edu.au/bmmg/christophe.


2.9 Supporting information. 71<br />

Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />

using the program Possum with methyl connectivity information in the Yb 3+ complex a


72 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />

a Calculations were performed using the experimental data of Table S2.1 and simulated data as<br />

described in the footnote of Table S2.2. As each Val, Leu and Ile residue contains two methyl<br />

groups, methyl specificity and methyl connectivity information can be used as additional<br />

information to support the resonance assignment. (Methyl specificity information refers to<br />

stereospecific assignments of the methyl groups of Val and Leu and the a priori distinction of 2<br />

and 1 methyl groups of Ile. Methyl connectivity information refers to the knowledge of which peaks<br />

arise from the same residue.) The results of four different combinations are shown, with and<br />

without methyl specificity information in the paramagnetic complexes, and with and without methyl<br />

specificity information in the diamagnetic complex. It was assumed that no methyl connectivity<br />

information can be established for the Dy 3+ complex because of strong PRE. The data are<br />

presented in the same format as in Table S2.2. Assignments were counted as correct whenever a<br />

methyl cross-peak was assigned to the correct residue, disregarding the stereospecific correctness<br />

of the assignment. Note that the maximum number of assignable methyl groups reported in the<br />

column marked “La max” can vary when both Dy 3+ and Yb 3+ data are used, because Possum has<br />

the freedom not to assign every HSQC cross-peak observed for the Dy 3+ complex to a peak<br />

observed for the Yb 3+ complex. This results in a small variation of the number of residues for which<br />

the program has paramagnetic information available and can attempt an assignment of the<br />

diamagnetic data.


2.9 Supporting information. 73<br />

Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />

of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+ complex<br />

a<br />

a The data are presented as in Table S2.3.


Chapter 3<br />

Numbat: new user-friendly<br />

method built for automatic Δχ-<br />

tensor determination<br />

3. Numbat: new user-friendly method built for automatic Δχ-tensor determination


76 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.1 Abstract<br />

Pseudocontact shift (PCS) effects induced by a paramagnetic lanthanide bound to a protein<br />

have become increasingly popular in <strong>NMR</strong> spectroscopy as they yield a complementary set of<br />

orientational and long-range structural restraints. PCS are a manifestation of the χ-tensor<br />

anisotropy, the Δχ-tensor, which in turn can be determined from the PCS. Once the Δχ-tensor has<br />

been determined, PCS become powerful long-range restraints for the study of protein structure and<br />

protein-ligand complexes. Here we present the newly developed package Numbat (New User-<br />

friendly Method Built for Automatic Δχ-Tensor determination). With a Graphical User Interface<br />

(GUI) that allows a high degree of interactivity, Numbat is specifically designed for the<br />

computation of the complete set of Δχ-tensor parameters (including shape, location and orientation<br />

with respect to the protein) from a set of experimentally measured PCS and the protein structure<br />

coordinates. Use of the program is illustrated by building a model of the complex between the E.<br />

coli DNA polymerase III subunits ε186 and θ using PCS.<br />

3.2 Keywords<br />

paramagnetic <strong>NMR</strong> · pseudocontact shift · magnetic susceptibility tensor · software ·<br />

program · unique tensor representation<br />

3.3 Abbreviations<br />

α Subunit α of the E. coli polymerase III<br />

ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />

θ Subunit θ of the E. coli polymerase III<br />

CSA Chemical shielding anisotropy<br />

GUI Graphical user interface<br />

HOT The bacteriophage P1-encoded homolog of θ<br />

PCS Pseudocontact shift<br />

RACS Residual anisotropic chemical shift<br />

RDC Residual dipolar coupling<br />

UTR Unique Δχ-tensor representation


3.4 Introduction<br />

3.4 Introduction. 77<br />

Paramagnetic lanthanide ions bound to the natural metal-binding site of a metalloprotein or<br />

introduced via a lanthanide tag provide a number of paramagnetic effects that can be distance<br />

dependent (i.e. paramagnetic relaxation enhancement), orientation dependent (i.e. residual dipolar<br />

couplings, RDC), or a combination of both, like cross-correlated relaxation effects and<br />

pseudocontact shifts (PCS;(Bertini et al., 2002, Pintacuda et al., 2004)). PCS present particularly<br />

valuable structural restraints, as they are easy to measure and provide long-range information that<br />

would be difficult to obtain by other techniques. PCS originate from unpaired electron spins which<br />

lead to an anisotropic magnetic susceptibility tensor (χ-tensor). PCS restraints induced by<br />

lanthanide ions have been used to investigate structural and dynamical properties of proteins<br />

(Allegrozzi et al., 2000, Bertini et al., 2001, Bertini et al., 2004, Gaponenko et al., 2004, Jensen et<br />

al., 2006, Eichmüller et al., 2007, Wang et al., 2007) and protein-ligand complexes (John et al.,<br />

2006, Pintacuda et al., 2007).<br />

In order to apply PCS restraints, eight variables have to be determined. These comprise the<br />

lanthanide position (three Cartesian coordinates), three angles (e.g. Euler angles) that relate the<br />

molecular frame to the χ-tensor frame, and the axial and rhombic anisotropy parameters of the χ-<br />

tensor. (Since PCS depend only on the χ-tensor anisotropy Δχ rather than the absolute magnitude of<br />

the χ-tensor, it is sufficient to determine the anisotropy parameters represented by the Δχ-tensor.)<br />

Several integrated software tools are available for the determination and study of the alignment<br />

tensor using RDCs (Dosset et al., 2000, Zweckstetter et al., 2000, Valafar et al., 2004, Wei et al.,<br />

2006). For the situation where the 3D structure of the protein is known a priori, corresponding<br />

tools for the determination of the Δχ-tensor from PCS have been developed but are more limited in<br />

scope. The program Fantasia (Banci et al., 1996) and its extension Fantasian (Banci et al., 1997)<br />

can fit the magnitude and Euler angles of the Δχ-tensor using a set of experimental PCS but<br />

requires prior knowledge of the metal coordinates. The program Platypus (Pintacuda et al., 2004)<br />

can simultaneously fit the Δχ-tensor and assign the signals of 15 N-HSQC spectra of samples<br />

containing diamagnetic and paramagnetic lanthanides, but assumes that the 15 N-HSQC peaks are<br />

sufficiently well resolved such that the paramagnetic peaks can be unambiguously associated with<br />

their diamagnetic partners. The program Echidna (Schmitz et al., 2006) uses assigned diamagnetic<br />

15 N-HSQC cross-peaks of a uniformly 15 N-labelled protein to determine the magnitude and Euler<br />

angles of the Δχ- tensor and, simultaneously, the assignment of the paramagnetic 15 N-HSQC cross-<br />

peaks. It also requires prior knowledge of the approximate metal ion position. In principle, the<br />

structure refinement packages Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) with the


78 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

module PARArestraint for Xplor-NIH (Banci et al., 2004), GROMACS (Van der Spoel et al.,<br />

2005) with an implementation of orientation restraints (Hess et al., 2003) or DYANA (Güntert et<br />

al., 1997) with the module PSEUDYANA (Banci et al., 1998) could be used for Δχ-tensor<br />

determination from PCS but the protocols would be cumbersome. Considering that simultaneous<br />

determination of the Δχ-tensor and metal ion position relative to a known protein structure is a<br />

commonly required task, we set out to design a tool to achieve this in an easier and user-friendly<br />

way.<br />

While the metal coordinates of metalloproteins can be accurately determined by<br />

crystallography, the metal position must be fitted when no crystal structure is available, e.g., when<br />

the lanthanide is introduced via a lanthanide tag. None of the reported tools addresses this issue.<br />

Here we present the newly developed program Numbat (New User-friendly Method Built for<br />

Automatic Δχ-Tensor determination), which can simultaneously fit the Δχ-tensor and lanthanide<br />

coordinates using experimental PCS values and the coordinates of the protein. Furthermore, the<br />

program encompasses a number of useful tools for multiple data sets recorded with different<br />

paramagnetic lanthanides, for rigid-body docking using PCS, and for analysis and visualization of<br />

the results. Following a description of the algorithm on which the program builds and a<br />

presentation of the graphical user interface, we illustrate the use of Numbat for building the model<br />

of a complex in a rigid-body docking approach using PCS.<br />

3.5 Algorithm<br />

The Δχ-tensor can be determined and refined by the comparison between experimentally<br />

determined PCS values and PCS values back-calculated from the atomic coordinates of the<br />

molecular structure (Sherry et al., 1977, Lee et al., 1983, Emerson et al., 1990, Veitch et al., 1990,<br />

Banci et al., 1992, Capozzi et al., 1993). The pseudocontact shift of a nuclear spin i, PCSi calc , is<br />

given by (Bertini et al., 2002):<br />

(3.1)<br />

where i, i, i are the Cartesian coordinates of the nuclear spin i in the Δχ-tensor frame, ri is<br />

the distance between the spin i and the paramagnetic centre, and Δχax and Δχrh are the axial and


3.5 Algorithm. 79<br />

rhombic components of the Δχ-tensor. The orientation of the Δχ-tensor frame with respect to the<br />

protein frame can be specified, e.g., by three Euler angles α, β and γ 5 .<br />

To quantify the difference between experimental and back-calculated PCS values we define<br />

a quadratic cost c:<br />

(3.2)<br />

where PCSi exp is the experimental PCS for the spin i, and toli is its associated tolerance. The<br />

tolerance values can be used to reflect different uncertainties in the measurement of different PCS.<br />

When the lanthanide position is known, only five Δχ-tensor parameters have to be optimized. In<br />

this case, the least square fitting problem is linear, as can be seen from an alternate formulation of<br />

the PCS (Bertini et al., 2002):<br />

(3.3)<br />

where xi, yi, zi are the Cartesian coordinates of the spin i in an arbitrary frame f and Δχxx,<br />

Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in this frame. The Singular Value<br />

Decomposition (SVD) algorithm, which is commonly used to determine an alignment tensor from a<br />

set of experimental RDC (Valafar et al., 2004, Wei et al., 2006), would be a good candidate to<br />

minimize the cost c. The least square fitting, or the Simplex algorithm (Nelder et al., 1965) has<br />

been applied in previous work (Emerson et al., 1990, Capozzi et al., 1993). However the most<br />

general problem one has to solve is non-linear since the metal ion position may be unknown. We<br />

consequently chose for the non-linear least square fitting procedure in Numbat the Levenberg-<br />

Marquardt algorithm (Marquardt, 1963) as implemented in the GNU Scientific Library (Galassi et<br />

al., 2006).<br />

5 The parameters that are fitted by the software Numbat are: i, i, i, Δχax, Δχrh, α, β, and γ.


80 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.6 Program Features<br />

3.6.1 GUI<br />

The graphical user interface (GUI) of Numbat was built with the GTK+ library (Krause,<br />

2007) that is part of standard installations of recent Linux systems. Figure 3.1 shows two<br />

screenshots of the main interface of Numbat illustrating the intuitive and flexible user interface.<br />

Figure 3.1 Screenshots of Numbat main windows. (a) Graphical User Interface for the<br />

selection Structure and Data. Four PCS data sets can be loaded simultaneously under<br />

the tabs PCS1 to PCS4. The list of all atoms is displayed in the main frame and can be<br />

filtered with the Display tab to show only the atom or residue types of interest. The<br />

experimental PCS and the tolerance can be directly modified, and only atoms that are<br />

selected (see the column labelled “use?”) are taken into account in the calculations.<br />

The distance between the respective atom to the metal ion, the calculated PCS and the<br />

deviation between experimental and predicted PCS are calculated and displayed after


3.6 Program Features. 81<br />

each fitting procedure. (b) Graphical User Interface for Tensor Calculation. A Δχ-<br />

tensor can be fitted for each of the data sets PCS1 to PCS4. An additional tab<br />

(Multiple PCS) is for fitting different data sets that share the same metal-ion centre.<br />

The frame PDB selector allows the choice of the model(s) to be used from a family of<br />

conformers loaded. The Tensor search restraints frame allows the individual selection<br />

of each of the eight variables to be free, fixed or constrained between two values. The<br />

computed Δχ-tensor values are displayed with error estimates from the GSL<br />

implementation of the Levenberg-Marquardt algorithm and the corresponding unique<br />

tensor representation (“UTR”) is reported.<br />

3.6.2 Input files<br />

Numbat reads atomic coordinates from protein data bank (PDB; (Berman et al., 2000)) files.<br />

In the case of <strong>NMR</strong> structures, the entire ensemble of conformers is loaded and any subset can be<br />

selected for subsequent calculations. When optimizing the Δχ-tensor, PCS are back-calculated for<br />

each selected structure and averaged for the computation of the cost function c (equation (3.2)).<br />

PCS data can be read either in the Xplor-NIH format or in a format specific to Numbat. For test<br />

purposes, Numbat also allows the generation of PCS data (optionally with addition of Gaussian<br />

noise) for a user-specified Δχ-tensor.<br />

3.6.3 Methyl group definition<br />

The 1 H chemical shift of a rotating methyl group can be described as the average of the<br />

chemical shifts of the three 1 H spins. The selection ―methyl association‖ in the GUI allows<br />

definition of pseudoatom names for any methyl group for which the experimental PCS value is to<br />

be treated as the average of the PCS of the three 1 H nuclei. The pseudoatom names can be used to<br />

identify the experimental PCS values of methyl groups in the input file. Alternatively, the PCS<br />

values of methyl groups can be interactively entered via the user-interface.<br />

3.6.4 Optimization of the tensor parameters<br />

In order to give the user a maximum of flexibility, any subset of the eight Δχ-tensor<br />

variables can be optimized with the remaining ones fixed to user-specified values. Such a situation<br />

occurs, for example, when a protein-ligand complex is studied where the protein is tagged with a<br />

lanthanide. First, the Δχ-tensor can be determined using the PCS measured for the protein. Fitting<br />

of the position and orientation of the Δχ-tensor with respect to the ligand can subsequently be


82 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

performed with a minimal number of adjustable parameters by keeping the axial and rhombic<br />

components of the Δχ-tensor fixed at the values determined for the protein. The Δχ-tensors<br />

determined for the protein and the ligand can finally be superimposed to derive a model of the<br />

protein-ligand complex (Pintacuda et al., 2007).<br />

Numbat also offers the option of restricting the Δχ-tensor variables within user-defined<br />

boundaries. This is useful if the magnitude, position and/or orientation of the Δχ-tensor is<br />

approximately known from previous studies (Su et al., 2008). Depending on the quality and<br />

quantity of PCS measurements available, the Δχ-tensor variables (especially the lanthanide<br />

coordinates) may only reach a local minimum during the optimization procedure. Therefore the<br />

starting values of all Δχ-tensor variables used to initialize the minimiser can be changed<br />

interactively within Numbat.<br />

3.6.5 Residual Anisotropic Chemical Shifts (RACS)<br />

Paramagnetic lanthanides bound to the protein weakly align the molecule in the magnetic<br />

field resulting in an incomplete averaging of the anisotropic chemical shifts. This can affect the<br />

PCS by a shift of up to 0.2 ppm for backbone 15 N and 13 C’ spins at a magnetic field of 18.8 T (John<br />

et al., 2005). The RACS correction term Δδ RACS for 1 H N , backbone 15 N and 13 C’ spins can be<br />

calculated given the Δχ-tensor and the chemical shielding anisotropic tensor (CSA-tensor) using<br />

(John et al., 2005):<br />

(3.4)<br />

where B0 is the magnetic field, μ0 the induction constant, k the Boltzmann constant, T the<br />

temperature, ζii CSA the principal components of the CSA-tensor, cos θij the nine direction cosines<br />

between pairs of the principal axis of the Δχ-tensor and the CSA-tensor, and Δχjj the principal<br />

components of the Δχ-tensor. Numbat optionally uses the RACS correction term when generating<br />

PCS data and fitting Δχ-tensors. The orientations of the principal component axes of the nuclear<br />

CSA-tensors and the ζii CSA values for 1 H N , backone 15 N and 13 C’ are taken from (Cornilescu et al.,<br />

2000).<br />

3.6.6 Multiple PCS data sets


3.6 Program Features. 83<br />

A new PCS data set can be obtained by replacing one paramagnetic lanthanide with another<br />

paramagnetic lanthanide. Multiple PCS data sets obtained in this way share a conserved lanthanide<br />

position, but different orientations and magnitudes of the Δχ-tensors must be fitted to each<br />

individual PCS data set. Numbat can perform a simultaneous fit of the Δχ-tensors and the shared<br />

lanthanide position. This feature is of particular interest when only a limited number of PCS can be<br />

measured for each lanthanide ion, as fewer variables in the Δχ-tensor fit will facilitate the<br />

determination of accurate Δχ-tensor parameters. For example, a limited set of unambiguously<br />

measured PCS can be used to determine initial Δχ-tensor parameters from which the PCS of<br />

unassigned paramagnetic cross-peaks can be back-calculated, leading to assignments of additional<br />

paramagnetic cross-peaks and improved Δχ-tensor parameters. Similarly, applications to small<br />

ligand molecules with a small number of <strong>NMR</strong> signals are aided by limiting the number of<br />

adjustable variables to a minimum.<br />

3.6.7 PCS modification<br />

Once an initial Δχ-tensor has been fitted, Numbat computes and displays PCS values for all<br />

atoms. Doubtful assignments can easily be detected at this stage by inspection of the deviation<br />

between experimental and calculated values. Numbat allows interactive modification of PCSi exp and<br />

toli as well as the input of additional PCS data.<br />

3.6.8 PCS selection<br />

The experimental PCS values to be used for the Δχ-tensor fit can be selected according to<br />

three criteria: A list of (i) residue types or (ii) atom types can be provided by the user. This is<br />

convenient in the case of selectively isotope-labelled proteins and allows a quick assessment of the<br />

amount of information necessary in order to retrieve a robust Δχ-tensor. (iii) Each individual PCS<br />

can be selected or deselected interactively via the GUI interface. This is particularly convenient if,<br />

after initial optimisation of the Δχ-tensor, some of the back-calculated PCS consistently show large<br />

deviations with respect to the experimental values, which may be due to erroneous assignments or<br />

discrepancies between the atomic coordinates of the PDB file and the actual structure of the<br />

protein, as is often the case for flexible polypeptide segments. Deselecting the corresponding atoms<br />

is likely to improve the Δχ-tensor fit in the next iteration.<br />

3.6.9 Conventions


84 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Different conventions have been used in the literature to report Δχ-tensor parameters,<br />

including different definitions of Euler angles, choice of principal and secondary axis of the Δχ-<br />

tensor, and units of Δχ-tensor magnitudes. Numbat can report the Δχ-tensor parameters in many<br />

different conventions but uses as a default the following conventions: (i) The axes of the Δχ-tensor<br />

frame are labelled such that |Δχzz| ≥ |Δχyy| ≥ |Δχxx| in analogy to alignment tensor conventions<br />

(Clore et al., 1998). This ensures that axial and rhombic components are always of the same sign.<br />

(ii) The Euler angles α, β and γ are expressed in the ―ZYZ‖ convention, i.e., the first rotation of<br />

angle α is around the z axis of the protein frame, the second rotation of angle β is around the new y’<br />

axis and the last rotation of angle γ is around the new z’’ axis (Figure 3.2). While for an<br />

asymmetric object the Euler angles are uniquely defined if the angles α, β and γ are taken in the<br />

intervals [0, 2π[, [0, π[, [0, 2π[, respectively, ambiguities arise for symmetric objects. Therefore, we<br />

chose the interval [0, π[ for all three angles, eliminating the potential ambiguities arising from the<br />

four symmetry-related Δχ-tensors that generate the same PCS values. In the case of β = 0, an<br />

infinite number of combinations of and would produce the same overall rotation. In this case,<br />

we set γ = 0. These two rules ensure that any Δχ-tensor is unambiguously reported as a single set of<br />

parameters which is referred to in the GUI as UTR (Unique Δχ-Tensor Representation).<br />

Figure 3.2 Euler angle definitions used by Numbat. The relative orientation of the Δχ-<br />

tensor frame with respect to the protein frame is defined by Euler rotations of angle α, β<br />

and γ in the ZYZ convention. (a) A right-handed rotation of angle α around the z axis is<br />

applied to the protein frame xyz to give the frame x’y’z’. (b) A second rotation of angle<br />

β around the new axis z’ is applied to the frame x’y’z’ to give x’’y’’z’’. (c) The last<br />

rotation of angle γ around the z’’ axis gives the Δχ-tensor frame.<br />

3.6.10 Error analysis<br />

The Levenberg-Marquardt algorithm is used to minimize the cost c (equation (3.2)), but the<br />

quality of the fit cannot be assessed without further error analysis. Therefore, in addition to the<br />

uncertainty values provided by the GSL implementation of the minimiser, Numbat embeds a Monte


3.6 Program Features. 85<br />

Carlo protocol with random Gaussian noise added either to the atomic coordinates of the molecule<br />

or to the experimental PCS values. The robustness of the Δχ-tensor fit with respect to the PCS data<br />

set can also be tested by random subset selection of the PCS values used. Resulting Δχ-tensor<br />

orientations are displayed in a Sanson-Flamsteed projection (Bugayevskiy et al., 1995) using the<br />

plotting utility gnuplot.<br />

3.6.11 Visualization<br />

Graphical visualization of the Δχ-tensor frame and isosurfaces of PCS values in the<br />

structure of the molecule presents a convenient way to assess the similarity of the principal axes of<br />

multiple Δχ-tensors and the similarity of their respective isosurfaces. To this end Numbat interfaces<br />

with the molecular viewers MOLMOL (Koradi et al., 1996) and PyMOL (DeLano, 2002) by<br />

generating suitable macro files and displaying the Δχ-tensor frame and corresponding PCS<br />

isosurfaces in superimposition with the protein studied, as illustrated in Figure 3.3. The files of the<br />

macros, PCS potential and PDB file containing the coordinates of the protein together with<br />

coordinates of the metal ion and Δχ-tensor axes can also be saved for later use.<br />

Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of<br />

its orientational uncertainty in a Sanson-Flamsteed projection plot. Numbat can<br />

directly call MOLMOL and PyMOL to display the axes of the fitted Δχ-tensor and<br />

PCS isosurfaces at user-defined contour levels. The orientational uncertainty of the<br />

Δχ-tensor frame can be evaluated by a Monte-Carlo protocol with random additions<br />

of noise to the structure coordinates and/or PCS data, with optional random


86 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

selection of subsets of data. Numbat calls gnuplot to display the results in a Sanson-<br />

Flamsteed projection plot.<br />

3.6.12 Output<br />

The list of PCS can be saved in Xplor-NIH format and in a Numbat-specific format. The<br />

weak molecular alignment in the magnetic field resulting from a non-vanishing Δχ-tensor can be<br />

described by an alignment tensor with principal axes parallel to those of the Δχ-tensor and axial and<br />

rhombic components that are directly proportional to Δχax and Δχrh, respectively (Tolman et al.,<br />

1995). Numbat calculates the RDC between two spins A and B for the situation of a completely<br />

rigid molecule, using (Bertini et al., 2002)<br />

(3.5)<br />

where γA and γB are the magnetogyric ratios of spins A and B, respectively, ħ the Planck<br />

constant divided by 2π, S the order parameter, rAB the internuclear distance, and AB, AB, AB the<br />

coordinates of the vector AB expressed in the Δχ-tensor frame. The RDC values are reported in<br />

Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) and Pales (Zweckstetter et al., 2000)<br />

format.<br />

Finally, Numbat can generate PDB files where the Δχ-tensor is reported in a format ready<br />

for use with MOLMOL or PyMOL for rigid-body docking alignment, or for further refinement by<br />

Xplor-NIH.<br />

3.7 Study case<br />

The proteins ε and θ are subunits of the complex of proteins constituting E. coli DNA<br />

polymerase III. The complex between the N-terminal domain of ε (ε186) and θ has been<br />

extensively studied using PCS data (Pintacuda et al., 2006, Pintacuda et al., 2007). In light of the<br />

recent crystal structure of the complex between ε186 and the θ homolog HOT (Kirby et al., 2006),<br />

we illustrate in the following the features of Numbat by revisiting the <strong>NMR</strong> structure of the<br />

complex between ε186 and θ which was derived from PCS induced by Dy 3+ and Er 3+ ions bound to<br />

the natural metal-binding site of ε186 (Pintacuda et al., 2006).


3.7 Study case. 87<br />

The coordinates of the A chain in the PDB deposition 2IDO (Kirby et al., 2006) was used as<br />

the structural model for ε186. The structural model of θ was conformer 10 of the <strong>NMR</strong> structure of<br />

θ in complex with ε186 (PDB accession code 2AXD; (Keniry et al., 2006)). This conformer was<br />

chosen because it has the lowest backbone RMSD to the HOT protein (2.1 Å) for residues 9-66 (the<br />

structurally defined region for which meaningful PCS could be measured). The experimentally<br />

determined PCS values of ε186 have been reported previously (Schmitz et al., 2006) and the PCS<br />

values of θ are provided in the Supporting Information. All Δχ-tensor optimizations were performed<br />

using Numbat including the RACS correction term and a tolerance value toli of zero for all spins.<br />

3.7.1 Subunit ε186<br />

Table 3.1 presents the results of the Δχ-tensor fit to the PCS measured for ε186. Initially,<br />

individual eight-variable Δχ-tensor optimizations were performed using the PCS data of each<br />

lanthanide (Table 3.1, columns 1 and 2). Next, the Numbat GUI was updated to display the<br />

deviations between the experimental and back-calculated PCS for the Δχ-tensors found. Several<br />

atoms showed deviations > 0.15 ppm between the experimental and back-calculated PCS (15 out of<br />

199 and 8 out of 255 atoms in the case of Dy 3+ and Er 3+ , respectively. Without the RACS<br />

correction, deviations > 0.15 ppm where observed for 36 and 7 atoms, respectively). Assuming that<br />

these outliers were due to problematic measurements or inaccuracies of the 3D structure, these PCS<br />

were removed interactively using the GUI. Re-calculation of the Δχ-tensor was found not to change<br />

the fitted Δχ-tensor parameters significantly for any of the lanthanide ions (results not shown). This<br />

can be explained by the high quality and large number of experimental PCS data available for each<br />

lanthanide (backbone 13 C’, 15 N and 1 H N spins), resulting in robust fits of the Δχ-tensors.<br />

Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule<br />

ε186 a θ b<br />

Individual c Combined d Individual c Combined d Fixed e<br />

Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+<br />

Δχax f 42.3 -10.6 42.3 -10.7 40.1 -13.0 40.2 -10.0 42.3 -10.7<br />

Δχrh f 5.3 -5.1 5.3 -5.1 14.8 -6.5 14.9 -4.8 5.3 -5.1<br />

α g 169.5 144.2 169.5 143.9 27.7 23.7 27.7 19.9 42.2 34.9<br />

β g 30.2 29.1 30.2 29.2 114.6 108.8 113.9 118.2 119.2 121.5<br />

γ g 134.6 126.9 134.7 126.8 28.4 170.6 27.3 177.7 44.7 177.4<br />

mx h 29.4 29.3 29.4 29.4 6.2 9.5 6.4 6.4 4.3 4.3<br />

my h 31.9 32.0 31.9 31.9 -7.5 -7.2 -7.5 -7.5 -5.5 -5.5


88 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

mz h 26.7 26.7 26.7 26.7 -18.9 -19.0 -18.8 -18.8 -19.8 -19.8<br />

a Δχ-tensor parameters determined relative to chain A in the PDB coordinate set 2IDO<br />

b Δχ-tensor parameters determined relative to model 10 in the PDB data set 2AXD<br />

c Δχ-tensors determined from PCS induced by Dy 3+ or PCS induced by Er 3+ (individual<br />

optimization)<br />

d Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously and optimizing for<br />

a single metal ion position (combined optimization)<br />

e Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously, optimizing for a<br />

single metal ion position and fixing the Δχax and Δχrh at the values determined from the PCS data of<br />

ε186 (fixed optimization)<br />

f In units of 10 -32 m 3<br />

g Euler rotations in the ZYZ convention (degrees)<br />

h Metal ion coordinate (Å) relative to chain A in the PDB coordinate set 2IDO<br />

Since the coordinates of the Dy 3+ and Er 3+ found in the individual fits were very similar<br />

(Table 3.1, columns 1 and 2), we subsequently assumed that the Δχ-tensors induced by each<br />

lanthanide are centered at the same position relative to ε186. The results obtained by<br />

simultaneously fitting the distinct Δχ-tensors while restraining their metal coordinate to a common<br />

centre (Table 3.1, columns 3 and 4) show little difference to the Δχ-tensor parameters found when<br />

performing the individual optimizations.<br />

For comprehensive error analysis, we introduced a random error into the structure<br />

coordinates of ε186, where the atomic coordinates were varied according to a Gaussian distribution<br />

with a standard deviation ζ of 0.5 Å, resulting in a mean atom displacement of 0.8 Å. The resulting<br />

uncertainty in Δχ-tensor parameters was approximately equivalent to the uncertainty introduced by<br />

a random variation added to the measured PCS data sampled from a Gaussian distribution with a<br />

standard deviation ζ of 0.15 ppm. The Δχ-tensor parameters of ε186 were well defined, as the<br />

values of all eight Δχ-tensor variables determined by 1000 randomized pseudo-replicates of the<br />

structure were in good agreement with the Δχ-tensors fitted to the original structure (Table 3.2,<br />

column 1). To eliminate the possibility that the quality of the Δχ-tensor fit was significantly<br />

affected by the number of PCS measured, the error analysis for the Δχ-tensors fitted to ε186 was<br />

recalculated with random selection of only 20% of the measured PCS. The results (Table 3.2,<br />

column 2) show that the Δχ-tensor parameters of ε186 were still well defined. Figure 1.14.a


3.7 Study case. 89<br />

illustrates how well the Δχ-tensor axis are defined, even when randomly disregarding 50% of the<br />

data.<br />

Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a<br />

ε186 θ<br />

Structure variation Subset of PCS Structure variation Subset of PCS<br />

Δχax b 42.0 (0.8) 42.4 (1.1) 41.9 (4.3) 40.3 (3.1)<br />

Δχrh b 5.3 (0.5) 5.4 (0.8) 15.0 (4.5) 15.3 (2.8)<br />

α c 169.5 (0.7) 169.7 (0.9) 29.3 (6.1) 27.6 (3.2)<br />

β c 30.2 (0.3) 30.2 (0.5) 114.5 (4.3) 114.4 (3.3)<br />

γ c 134.0 (2.6) 134.7 (4.0) 29.2 (10.6) 28.9 (7.9)<br />

mx d 29.4 (0.1) 29.4 (0.2) 6.1 (1.3) 6.2 (0.9)<br />

my d 31.9 (0.1) 31.9 (0.2) -7.4 (1.0) -7.6 (0.7)<br />

mz d 26.7 (0.1) 26.7 (0.1) -19.1 (0.8) -18.9 (0.4)<br />

a The average values of the Δχ-tensors and their standard deviations (in brackets) are reported.<br />

Average values and standard deviations were calculated from 1000 sets of randomised atom<br />

coordinates (where the extent of randomisation followed a Gaussian distribution with a standard<br />

deviation ζ of 0.5 Å) or from randomly picked subsets of the PCS data (20% in the case of ε186 and<br />

80% in the case of θ where much fewer PCS were available)<br />

b In units of 10 -32 m 3<br />

c Euler rotations in the ZYZ convention (degrees)<br />

d Metal ion coordinate (Å) in the protein frame (A chain of the PDB coordinates 2IDO and model<br />

10 in the PDB data set 2AXD, respectively)<br />

3.7.2 Subunit θ<br />

The results of the Δχ-tensor determination in the molecular frame of θ are presented in<br />

Table 3.1. There was only a small number of spins for which the back-calculated PCS deviated<br />

from the experimental PCS by more than 0.15 ppm (4 out of 50 in the case of Dy 3+ , 0 out of 41 for


90 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Er 3+ ). Like for ε186, removal of these PCS from the optimization did not significantly change the<br />

parameters of the fitted Δχ-tensors. While the Δχax and Δχrh values of Er 3+ determined from the PCS<br />

observed for θ and 186 were very similar, the Δχrh value of the Dy 3+ tensor found for θ was almost<br />

three times larger than that found for ε186 6 . We subsequently performed an error analysis for θ as<br />

for the ε186 subunit, introducing either random variations into the atomic positions of θ according<br />

to a Gaussian distribution with a standard deviation ζ of 0.5 Å or using a random selection of only<br />

80% of the measured PCS. In either case, the Δχ-tensor parameters of θ proved to be less well<br />

defined than those of ε186 (Table 3.2) and Figure 1.14.b. As θ samples a relatively small and<br />

remote volume of the Δχ-tensors due to its spatial separation from the metal ion, one would expect<br />

a less accurate determination of the Δχ-tensors from the θ data. The effect could be exacerbated by<br />

inaccuracies of the <strong>NMR</strong> structure.<br />

In order to compensate for the smaller number of experimentally determined PCS available<br />

for θ (only 1 H N spins) and the poorer quality of the Δχ-tensors fitted, we performed another fit with<br />

Δχax and Δχrh fixed to the values determined for ε186 (Table 3.1, columns 9 and 10). Analysis of<br />

the experimental versus back-calculated PCS, both for the eight- and six-variable fits of the Δχ-<br />

tensor to θ, showed that the PCS deviations were similar in magnitude and trends. Therefore,<br />

constraining Δχax and Δχrh did not significantly deteriorate the quality of the fit, despite considerable<br />

changes of the Δχ-tensor parameters (Table 3.1). The variability of the Δχ-tensor parameters over<br />

all the 12 deposited θ conformers in 2AXD using the fixed optimisation scheme is provided in the<br />

Supporting Information.<br />

3.7.3 Modelling the complex between ε186 and θ<br />

Numbat facilitates the modelling of protein-protein complexes by listing coordinates of the<br />

Δχ-tensor axes together with the protein coordinates in files in PDB format. Superimposition of the<br />

Δχ-tensors fitted to ε186 and θ for each lanthanide ion yields the three-dimensional structure of the<br />

ε186/θ complex by straightforward rigid-body docking. Standard PyMOL or MOLMOL commands<br />

6 The discrepancy of the Rhombic component would not necessarily affect the rigid body docking<br />

of the complex, as only the orientation of the Δχ-tensors and the coordinates of the paramagnetic<br />

center are used.


3.7 Study case. 91<br />

can be used to align the Δχ-tensors. Numbat reports the coordinate system of the Δχ-tensor in such<br />

a way that all four degenerate solutions arising from the symmetry of the Δχ-tensor about the x, y<br />

and z axes (Figure 3.4) can easily be visualised. Identification of the correct solution requires<br />

additional information, such as proper steric interactions, chemical shift perturbation data or<br />

knowledge of the biological function of the complex. The most objective way, however, is by<br />

simultaneous evaluation of the Δχ-tensors of different lanthanides (Pintacuda et al., 2006).<br />

In the case of the complex between ε186 and θ, the Δχ-tensor frames of Dy 3+ and Er 3+ share<br />

a common origin for both proteins. Seven coordinates are necessary to define two Δχ-tensor frames<br />

sharing the same origin. Because of the second Δχ-tensor, the degeneracy of Figure 3.4 is broken.<br />

There are exactly 16 possibilities to align two pairs of Δχ-tensor. The lowest RMSD value resulting<br />

from all 16 possible 7-coordinate alignments between the two combined Δχ-tensors identified a<br />

single relative orientation of the two proteins as the best solution. The position of θ relative to ε186<br />

derived from PCS data in this way was also the correct solution. It agreed with a model of the<br />

complex obtained by superimposition of θ onto HOT in the ε186/HOT complex, with a backbone<br />

RMSD of 4.4 Å. Similarly for the Δχ-tensor of θ calculated with fixed Δχax and Δχrh values, a<br />

backbone RMSD of 4.3 Å was calculated relative to HOT. When PCS data from only Dy 3+ or Er 3+<br />

were used, the backbone RMSD values were, respectively, 4.2 Å and 4.4 Å for the best fit to the<br />

ε186/HOT complex. The model of the ε186/θ complex derived from the fixed, Dy 3+ and Er 3+ data<br />

sets is displayed in Figure 3.5.


92 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

3.8 Conclusion<br />

Figure 3.4 The four degenerate solutions arising from the symmetry of the<br />

Δχ-tensor around the x, y and z axes. All four possibilities result in the same<br />

calculi of PCS, hence in the same isosurfaces.<br />

Figure 3.5 The complex between ε186 and θ determined by superimposition<br />

of Δχ-tensors. The ε186/HOT complex (PDB accession code 2IDO) is shown<br />

for reference, with ε186 coloured in silver and HOT (residues 9-66) in<br />

orange. The isosurfaces correspond to the PCS induced by the Dy 3+ ion<br />

(from individual optimisation) contoured at +/-1.5 ppm and +/-0.5 ppm.<br />

Blue and red isosurfaces represent regions with positive and negative PCS,<br />

respectively. Residues 9-66 of θ are shown as a thin ribbon in the position<br />

defined by the fixed Dy 3+ and Er 3+ data (brown).<br />

The program Numbat is the first software package for fitting Δχ-tensors from PCS data with<br />

a user-friendly graphical user interface (GUI). Numbat calculations are fast, as it was written with<br />

open-source Linux routines in C. While the main task of Numbat is the fit of the eight Δχ-tensor


3.9 Acknowledgment. 93<br />

variables, the intuitive GUI combined with convenient data handling, including Monte-Carlo error<br />

analysis and links to the molecular viewers MOLMOL and PyMOL, offer high flexibility of use.<br />

The study case of the complex formed between the subunits ε186 and θ of E. coli DNA polymerase<br />

III illustrates the simplicity of use of Numbat.<br />

The program is freely available under the GNU General Public License (GPL) upon request<br />

(see also http://compbio.chemistry.uq.edu.au/bmmg/christophe/numbat.html).<br />

3.9 Acknowledgment<br />

Financial support from the Australian <strong>Research</strong> Council for project grants to G.O and T.H.<br />

is gratefully acknowledged.<br />

3.10 References<br />

Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />

pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />

Å from the metal ion. J Am Chem Soc 122:4154-4161<br />

Banci L, Bertini I, Bren KL, Cremonini MA, Gray HB, Luchinat C and Turano P (1996) The use of<br />

pseudocontact shifts to refine solution structures of paramagnetic metalloproteins:<br />

Met80Ala cyano-cytochrome c as an example. J Biol Inorg Chem 1:117-126<br />

Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />

restraints for Xplor-NIH. J Biomol <strong>NMR</strong> 28:249-261<br />

Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />

PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />

torsion angle molecular dynamics. J Biomol <strong>NMR</strong> 12:553-557<br />

Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />

HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />

dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />

29:68-76


94 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Banci L, Dugad LB, La Mar GN, Keating KA, Luchinat C and Pierattelli R (1992) 1 H nuclear<br />

magnetic resonance investigation of cobalt(II) substituted carbonic anhydrase. Biophys J<br />

63:530-543<br />

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN and Bourne<br />

PE (2000) The protein data bank. Nucleic Acids Res 28:235-242<br />

Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />

Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />

domain reorientation in calmodulin. Proc Natl Acad Sci U S A 101:6841-6846<br />

Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />

Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />

Ca Ln calbindin D9k. J Biomol <strong>NMR</strong> 21:85-98<br />

Bertini I, Luchinat C and Parigi G (2002) Magnmagnetic suceptibility in paramgnetic nmr. Prog<br />

<strong>NMR</strong> Spectrosc 40:249-273<br />

Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />

London.<br />

Capozzi F, Cremonini MA, Luchinat C and Sola M (1993) Assignment of pseudo-contact-shifted<br />

1 H <strong>NMR</strong> resonances in the EF site of Yb 3+ -substituted rabbit parvalbumin through a<br />

combination of 2D techniques and magnetic susceptibility tensor determination. Magn<br />

Reson Chem 31:S118-S127<br />

Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />

the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />

structural information. J Magn Reson 133:216-221<br />

Cornilescu G and Bax A (2000) Measurement of proton, nitrogen, and carbonyl chemical shielding<br />

anisotropies in a protein dissolved in a dilute liquid crystalline phase. J Am Chem Soc<br />

122:10143-10154<br />

DeLano WL (2002) The PyMOL molecular graphics system. Palo Alto, CA, USA.<br />

Dosset P, Hus JC, Blackledge M and Marion D (2000) Efficient analysis of macromolecular<br />

rotational diffusion from heteronuclear relaxation data. J Biomol <strong>NMR</strong> 16:23-28<br />

Eichmüller C and Skrynnikov NR (2007) Observation of μs time-scale protein dynamics in the<br />

presence of Ln 3+ ions: Application to the N-terminal domain of cardiac troponin C. J<br />

Biomol <strong>NMR</strong> 37:79-95<br />

Emerson SD and La Mar GN (1990) <strong>NMR</strong> determination of the orientation of the magnetic-<br />

susceptibility tensor in cyanometmyoglobin: A new probe of steric tilt of bound ligand.<br />

Biochemistry 29:1556-1566


3.10 References. 95<br />

Galassi M, Davies J, Theiler B, Gough G, Jungman M, Booth M and Rossi F (2006). GNU<br />

scientific library reference manual. Network Theory Ltd, Bristol.<br />

Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />

of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J<br />

Biomol <strong>NMR</strong> 28:205-212<br />

Güntert P, Mumenthaler C and Wüthrich K (1997) Torsion angle dynamics for <strong>NMR</strong> structure<br />

calculation with the new program DYANA. J Mol Biol 273:283-298<br />

Hess B and Scheek RM (2003) Orientation restraints in molecular dynamics simulations using time<br />

and ensemble averaging. J Magn Reson 164:19-27<br />

Jensen MR, Hansen DF, Ayna U, Dagil R, Hass MAS, Christensen HEM and Led JJ (2006) On the<br />

use of pseudocontact shifts in the structure determination of metalloproteins. Magn Reson<br />

Chem 44:294-301<br />

John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />

proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />

shifts. J Am Chem Soc 127:17190-17191<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />

Keniry MA, Park AY, Owen EA, Hamdan SM, Pintacuda G, Otting G and Dixon NE (2006)<br />

Structure of the θ subunit of Escherichia coli DNA polymerase III in complex with the ε<br />

subunit. J Bacteriol 188:4464-4473<br />

Kirby TW, Harvey S, DeRose EF, Chalov S, Chikova AK, Perrino FW, Schaaper RM, London RE<br />

and Pedersen LC (2006) Structure of the Escherichia coli DNA polymerase III ε-HOT<br />

proofreading complex. J Biol Chem 281:38466-38471<br />

Koradi R, Billeter M and Wüthrich K (1996) MOLMOL: A program for display and analysis of<br />

macromolecular structures. J Mol Graphics 14:51-55<br />

Krause A (2007). Foundations of GTK+ development. Apress, Berkeley, CA, USA.<br />

Lee L and Sykes BD (1983) Use of lanthanide-induced nuclear magnetic-resonance shifts for<br />

determination of protein structure in solution: EF calcium binding site of carp parvalbumin.<br />

Biochemistry 22:4366-4373<br />

Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind<br />

Appl Math 11:431-441<br />

Nelder JA and Mead R (1965) A simplex method for function minimization. Comput J 7:308-313<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc Chem Res 40:206-212


96 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />

assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />

Chem Soc 126:2963-2970<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />

Soc 128:3696-3702<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />

structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />

Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />

structure determination package. J Magn Reson 160:65-73<br />

Sherry AD and Pascual E (1977) Proton and carbon lanthanide-induced shifts in aqueous alanine.<br />

Evidence for structural changes along lanthanide series. J Am Chem Soc 99:5871-5876<br />

Su XC, McAndrew K, Huber T and Otting G (2008) Lanthanide-binding peptides for <strong>NMR</strong><br />

measurements of residual dipolar couplings and paramagnetic effects from multiple angles.<br />

J Am Chem Soc 130:1681-1687<br />

Tolman JR, Flanagan JM, Kennedy MA and Prestegard JH (1995) Nuclear magnetic dipole<br />

interactions in field-oriented proteins: Information for structure determination in solution.<br />

Proc Natl Acad Sci U S A 92:9279-9283<br />

Valafar H and Prestegard JH (2004) REDCAT: A residual dipolar coupling analysis tool. J Magn<br />

Reson 167:228-241<br />

Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE and Berendsen HJC (2005)<br />

GROMACS: Fast, flexible, and free. J Comput Chem 26:1701-1718<br />

Veitch NC, Whitford D and Williams RJP (1990) An analysis of pseudocontact shifts and their<br />

relationship to structural features of the redox states of cytochrome b5. FEBS Lett 269:297-<br />

304<br />

Wang X, Srisailam S, Yee AA, Lemak A, Arrowsmith C, Prestegard JH and Tian F (2007)<br />

Domain-domain motions in proteins from time-modulated pseudocontact shifts. J Biomol<br />

<strong>NMR</strong> 39:53-61<br />

Wei Y and Werner MH (2006) iDC: A comprehensive toolkit for the analysis of residual dipolar<br />

couplings for macromolecular structure determination. J Biomol <strong>NMR</strong> 35:17-25


3.11 Supporting information. 97<br />

Zweckstetter M and Bax A (2000) Prediction of sterically induced alignment in a dilute liquid<br />

crystalline phase: Aid to protein structure determination by <strong>NMR</strong>. J Am Chem Soc<br />

122:3791-3792<br />

3.11 Supporting information<br />

Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a<br />

Residue PCS Dy 3+ (ppm) PCS Er 3+ (ppm)<br />

ASP 9 -1.28 0.31<br />

GLN 10 -1.19 0.3<br />

THR 11 -1.11 0.25<br />

GLU 12 -1.24 0.27<br />

MET 13 -1.84 0.38<br />

ASP 14 -2 0.32<br />

LYS 15 -1.5<br />

VAL 16 -1.97 0.13<br />

VAL 18 -1.91 0.14<br />

ASP 19 -1.32 -0.03<br />

LEU 20 -1.29 -0.03<br />

ALA 21 -1.37 -0.14<br />

ALA 22 -0.57 -0.18<br />

ALA 23 0.15 -0.38<br />

GLY 24 0.37 -0.46<br />

VAL 25 0.29 -0.26<br />

ALA 26 0.54 -0.3<br />

PHE 27 1.21<br />

LYS 28 0.85 -0.31<br />

GLU 29 0.68 -0.19<br />

ARG 30 0.81 -0.23<br />

ASN 32 0.72


98 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />

MET 33 0.6<br />

VAL 35 0.02<br />

ILE 36 -0.29 0.08<br />

ALA 37 -0.12 0<br />

GLU 38 -0.22 0.04<br />

ALA 39 -0.4<br />

VAL 40 -0.55<br />

GLU 41 -0.51 -0.06<br />

ARG 42 -0.56 0.09<br />

GLU 43 -0.84 0.12<br />

GLU 46 -0.66 0.11<br />

LEU 48 -0.71 -0.01<br />

ARG 49 -0.58 0.08<br />

SER 50 -0.45 0.05<br />

TRP 51 -0.5<br />

PHE 52 -0.58<br />

ARG 53 -0.38<br />

GLU 54 -0.26 -0.01<br />

ARG 55 -0.23 -0.07<br />

LEU 56 -0.15 -0.09<br />

ILE 57 0.02 -0.09<br />

ALA 58 0.14 -0.11<br />

HIS 59 0.3 -0.18<br />

ARG 60 0.39 -0.17<br />

LEU 61 0.42 -0.15<br />

SER 63 0.8 -0.27<br />

VAL 64 0.71 -0.19<br />

ASN 65 0.8 -0.21<br />

LEU 66 -0.26<br />

a Experimental conditions as described in Pintacuda et al. (2006) J. Am. Chem. Soc. 128, 3696-<br />

3702


3.11 Supporting information. 99<br />

Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />

conformers b of the <strong>NMR</strong> structure of .<br />

Fixed a Fixed(Family) b<br />

Dy 3+ Er 3+ Dy 3+ Er 3+<br />

Δχax c 42.3 -10.7 42.3 -10.7<br />

Δχrh c 5.3 -5.1 5.3 -5.1<br />

α d 42.2 34.9 40.5 34.5<br />

β d 119.2 121.5 118.9 121.2<br />

γ d 44.7 177.4 38.1 174.8<br />

mx e 4.3 4.3 4.7 4.7<br />

my e -5.5 -5.5 -5.8 -5.8<br />

mz e -19.8 -19.8 -19.7 -19.7<br />

a The Δχ-tensor determined using the fixed optimisation scheme relative to θ conformer 10<br />

b The Δχ-tensor determined using the fixed optimisation scheme relative to simultaneously all 12<br />

deposited θ conformers<br />

c In units of 10 -32 m 3<br />

d Euler rotations in the ZYZ convention (degrees)<br />

e Metal ion coordinate (Å) in the protein frame (PDB data set 2AXD)


Chapter 4<br />

Protein Structure<br />

Determination from<br />

Pseudocontact Shifts using<br />

ROSETTA<br />

4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA<br />

Christophe Schmitz a , Robert Vernon b , Gottfried Otting c , David Baker b and Thomas Huber a<br />

a School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072,<br />

Australia<br />

b Department of Biochemistry, University of Washington, University of Washington, Seattle, WA<br />

98195<br />

c <strong>Research</strong> School of Chemistry, Australian National University, Canberra, ACT 0200, Australia<br />

Manuscript submitted to the Proceedings of the National Academy of Sciences of the United States<br />

of America.


102 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

4.1 Abstract<br />

Pseudocontact shifts (PCS) arise from paramagnetic metal ions bound to proteins and are<br />

manifested as large changes in chemical shifts detected in nuclear magnetic resonance (<strong>NMR</strong>)<br />

spectra. PCS data constitute long-range restraints on the positions of nuclear spins relative to the<br />

coordinate system of the magnetic susceptibility anisotropy tensor ( -tensor) of the metal ion.<br />

Protein structure determination using PCS data only, however, is hampered by the difficulty to<br />

determine the -tensor and metal position without knowledge of the protein structure. We have<br />

circumvented this problem in the program PCS-ROSETTA by using the structure prediction<br />

program ROSETTA to generate the models required for fitting of the -tensor parameters. PCS<br />

restraints implemented in the fragment assembly step of PCS-ROSETTA proved highly efficient in<br />

biasing the sampling of the conformational space towards the correct target structure. The results<br />

show that using a combination of chemical shift and PCS data, ROSETTA can determine structures<br />

accurately for proteins of up to 150 residues. Lanthanides can be incorporated into proteins quite<br />

generally through metal binding tags, and the combination of these data with the PCS-ROSETTA<br />

method provides a powerful new approach to protein structure determination.<br />

4.2 Introduction<br />

The three-dimensional (3D) structure of proteins is a prerequisite for understanding protein<br />

function, protein-ligand interactions and rational drug design. Protein structures can be readily<br />

determined by <strong>NMR</strong> spectroscopy. The most difficult part of an <strong>NMR</strong> structure determination<br />

typically is the assignment of sidechain chemical shifts and NOESY peaks. This bottleneck can<br />

potentially be avoided if methods for computing high accuracy structures from backbone-only<br />

<strong>NMR</strong> experiments can be developed.<br />

PCSs are a potentially rich source of structural information that are manifested as large<br />

changes in chemical shifts in the <strong>NMR</strong> spectrum caused by a non-vanishing magnetic susceptibility<br />

anisotropy tensor ( -tensor) of a paramagnetic metal ion. The PCS (in ppm) of a nuclear spin i<br />

depends on the polar coordinates ri, i, and i of the nuclear spin with respect to the -tensor<br />

frame of the metal ion and the axial and rhombic components of the -tensor:


4.2 Introduction. 103<br />

(4.1)<br />

The -tensor defines a coordinate system in the molecule that is centered on the metal ion<br />

and is fully described by eight parameters ( ax, rh, three Euler angles relating the orientation of<br />

the Δχ-tensor to the protein frame, and the coordinates of the metal ion). Therefore, the Δχ-tensor<br />

can be determined using PCS data from at least eight nuclear spins, provided the coordinates of the<br />

spins are known.<br />

As PCSs can be measured for nuclear spins 40 Å away from the metal, they present long-<br />

range structure restraints exquisitely suited to characterize the global structural arrangement of a<br />

protein. PCSs have thus been used very successfully to refine protein structures (Bertini et al.,<br />

2001, Gaponenko et al., 2004, Arnesano et al., 2005), dock protein molecules of known 3D<br />

structures (Ubbink et al., 1998, Pintacuda et al., 2006) and determine the structure of small<br />

molecules bound to a protein of known 3D structure (John et al., 2006, Pintacuda et al., 2007,<br />

Zhuang et al., 2008). The need for atom coordinates to determine the Δχ-tensor parameters,<br />

however, makes it more difficult to use PCSs in de novo determinations of protein 3D structures.<br />

All presently available protein structure determination software that uses PCS data to supplement<br />

conventional <strong>NMR</strong> restraints requires estimates of ax and rh as input parameters (Banci et al.,<br />

1998, Banci et al., 2004). These are often difficult to estimate accurately, as they depend on the<br />

chemical environment of the metal ion and the mobility of the paramagnetic center with respect to<br />

the protein.<br />

The ROSETTA structure prediction methodology (Simons et al., 1997) is well suited for<br />

taking advantages of the rich source of information inherent in PCSs. ROSETTA de novo structure<br />

prediction has two stages —first a low resolution phase in which conformational space is searched<br />

broadly using a coarse grained energy function, and second, a high resolution phase in which<br />

models generated in the first phase are refined in a physically realistic all atom force field. The<br />

bottleneck in structure prediction using ROSETTA is conformational sampling; close to native<br />

structures almost always have lower energies than non native structures. For small proteins ( < 100<br />

residues), ROSETTA has produced models with atomic level accuracy in blind prediction<br />

challenges (Raman et al., 2009). For larger proteins, however, structures close enough to the native<br />

structure to fall into the deep native energy minimum are generated seldom or not at all. This<br />

sampling problem can be overcome if even very limited experimental data is available to guide the<br />

initial low resolution search. For example, CS-ROSETTA uses <strong>NMR</strong> chemical shifts to guide


104 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

fragment selection and constrain backbone torsion angles, greatly improving the final yield of<br />

correctly folded protein models (Shen et al., 2008). As ROSETTA in favorable cases is capable of<br />

generating protein structures very close to experimentally determined structures from sequence<br />

information alone (Bradley et al., 2005), it is of great interest to combine ROSETTA with readily<br />

accessible experimental data to determine protein structures.<br />

In this paper we describe the incorporation of PCS data into ROSETTA. We show that this<br />

new PCS-ROSETTA method can generate accurate structures for proteins of up to 150 amino acids<br />

in length even from quite limited data sets.<br />

4.3 Results<br />

4.3.1 Test set<br />

We tested the new PCS-ROSETTA method (see Methods) on a benchmark of nine proteins<br />

for which chemical shifts and PCSs have been published. ArgN repressor was determined twice<br />

with PCS data measured from paramagnetic metal ions at two different sites. The proteins were<br />

between 56 to 186 amino acid residues in size, had different folds and had between 82 and 1169<br />

PCSs measured from one to eleven different metal ions located at a single metal binding site [Table<br />

4.1 and supporting information (SI) Table S4.1]. Fragments for each protein were selected with CS-<br />

ROSETTA using available chemical shift data and were used for all calculations. Structures of<br />

proteins with significant sequence similarity to the target proteins were explicitly excluded from the<br />

CS-ROSETTA database.<br />

Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA<br />

Targets PDB ID Nres a NM b c PCS-ROSETTA<br />

Npcs<br />

run d<br />

CS-ROSETTA<br />

run e<br />

rmsd f conv g Q h rmsd f conv g<br />

protein G (A) 3GB1 56 3 158 0.61 0.92 0.06 0.80 0.88 (Wilton et al., 2008)<br />

Refcs i Refpcs j<br />

calbindin (B) 1KQV 75 11 1169 1.46 2.04 0.16 4.96 4.37 (Balayssac et al., 2006)<br />

θ subunit (C) 2AE9 76 2 91 1.65 4.35 0.07 8.90 8.75 (Mueller et al., 2005)<br />

(Saio et al.,<br />

2009)<br />

(Bertini et al.,<br />

2001)<br />

(Schmitz et al.,<br />

2008)<br />

ArgN k (D) 1AOY 78 3 222 0.98 2.38 0.08 6.93 5.32 (Su et al., 2008) (Su et al., 2008)<br />

ArgN l (E) 1AOY 78 2 82 1.03 2.25 0.09 8.01 6.64 (Su et al., 2008) (Su et al., 2009a)<br />

N-calmodulin<br />

(F)<br />

1SW8 79 2 125 2.34 1.85 0.09 4.69 3.68 (Bertini et al., 2004)<br />

(Bertini et al.,<br />

2004)


thioredoxin (G) 1XOA 108 1 90 2.58 2.64 0.23<br />

parvalbumin<br />

(H)<br />

4.98 6.06<br />

(Lemaster et al., 1988,<br />

Chandrasekhar et<br />

al., 1991)<br />

1RJV 110 1 106 11.26 10.42 0.20 11.80 11.20 (Baig et al., 2004)<br />

calmodulin (I) 2K61 146 4 408 2.80 2.12 0.14 6.35 5.55 (Bertini et al., 2009)<br />

ε186 m (J) 1J54 186 3 738 20.57 17.54 0.36 15.46 17.23 (DeRose et al., 2002)<br />

a Number of residues.<br />

b Number of metal ions for which PCS data were measured.<br />

c Total number of PCSs measured.<br />

4.3 Results. 105<br />

(Jensen et al.,<br />

2006)<br />

(Baig et al.,<br />

2004)<br />

(Bertini et al.,<br />

2009)<br />

(Schmitz et al.,<br />

2006)<br />

d The structures used to calculate the rmsds were identified using the combined PCS-score and<br />

ROSETTA full atom energy on the whole protein sequence.<br />

e The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy on<br />

the whole protein sequence.<br />

f C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />

values were calculated using the core residues defined in SI Table S4.2.<br />

g Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />

structure, in Å. The rmsd values were calculated on the whole protein sequence.<br />

h Quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ) calculated on the structure of lowest<br />

PCS-ROSETTA score.<br />

i Reference for the experimental chemical shifts.<br />

j Reference for the experimental PCSs.<br />

k PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />

repressor (ArgN).<br />

l PCSs measured with a non-covalent tag bound to ArgN.<br />

m N-terminal 186 residues of the subunit of the E. coli polymerase III.<br />

4.3.2 Capacity of the PCS Score to Identify Native-like Structures<br />

The PCS score describes a model’s agreement with observed PCS data by calculating the<br />

expected PCS data given the structure. To calculate this, a three dimensional grid search is used to<br />

determine the metal coordinates and Δχ-tensor components necessary for producing an optimal<br />

match between calculated and observed data (see Materials and Methods). The capacity of the PCS<br />

score to identify native like models was tested on sets of 3000 CS-ROSETTA structures for each of<br />

the nine test proteins. These test structures were produced using a reduced fragment set and


106 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

included native fragments to ensure that some of the models were similar to the target structure.<br />

The C rmsd of the decoy with the lowest PCS score was always small (below 2.3 Å) with respect<br />

to the target protein (Figure 4.1). In addition, for all target proteins for which PCSs were available<br />

from two or more paramagnetic metal ion, low C rmsd values correlated with low PCS scores.<br />

This indicates that the PCS score can be used not only to identify near-native structures, but also to<br />

bias conformational sampling towards the native structure during the fragment assembly.<br />

Comparisons between the ROSETTA low resolution energy function and PCS score are shown in<br />

SI Figure S4.1.<br />

Figure 4.1 Fold identification by pseudocontact shifts. 3000 decoys were generated using CS-<br />

ROSETTA. In order to ensure the presence of decoys with low rmsd values to the target structure,<br />

the starting set of peptide fragments was reduced and included fragments from the known target<br />

structures. PCS scores are plotted versus the C rmsd to the target structure. The targets are<br />

labeled A-J as in Table 4.1. The PCS score correlates strongly with the C rmsd.<br />

PCSs from eleven different lanthanides were available for calbindin. In order to explore the<br />

value of using PCSs from multiple lanthanides, we rescored the structures using PCSs from both<br />

individual and multiple lanthanides. Linear regressions of PCS score versus rmsd had slopes<br />

ranging from 0.03 to 5.17 (average 2.26) for single data sets. Pairwise combination of PCS sets<br />

resulted in increased regression slopes ranging from 0.15 to 7.42 (average 3.84). Using all PCS sets<br />

resulted in a slope greater than 11, showing that PCSs from multiple metal ions greatly facilitate<br />

identification of native-like protein folds.<br />

4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA


4.3 Results. 107<br />

10000 decoys each were generated with CS-ROSETTA and PCS-ROSETTA. Both<br />

computations used the same fragment set, taking into account secondary structure information from<br />

chemical shift measurements. Figure 4.2 illustrates the ability of the PCS score to bias sampling<br />

towards the native structure. For seven out of the ten structure calculations, the PCSs dramatically<br />

increased the frequency with which decoys with low C rmsd to the reference structure were found.<br />

The effect was particularly pronounced for protein targets with larger PCS data sets. For example,<br />

more than a third of the decoys found for calmodulin had a C rmsd of less than 4 Å to the target<br />

structure, whereas fewer than 3% met this criterion in the absence of PCS data. Similar results were<br />

obtained for the θ subunit, protein G, and both ArgN repressor calculations. The PCS data did not<br />

significantly improve the results for thioredoxin and parvalbumin for which only PCS data from a<br />

single paramagnetic metal ion were available. No native-like structures were found for 186 which<br />

may be attributed to its larger size (186 residues). To evaluate the influence of the PCS score during<br />

the fragment assembly, we performed an additional calculation with the PCS score as the only<br />

energy term (SI Text S4.1).<br />

The low resolution models were subjected to full atom relaxation refinement in the last step<br />

of the calculation, using the full atom ROSETTA force field (without inclusion of the PCS score).<br />

The additional minimization step did not significantly change the overall shape of the distributions,<br />

but tended to improve the C rmsd of native-like decoys (SI Figure S4.2) and, most importantly,<br />

allows recognition of the best models based on their energies.<br />

Rescoring full atom relaxed structures with a weighted combination of the ROSETTA and<br />

PCS scores further improved the recognition of near-native structures as measured by the C rmsd<br />

of the lowest energy structure [Table 4.1-f, PCS-ROSETTA run; Figure 4.3], with PCS-ROSETTA<br />

identifying low C rmsd (< 3 Å) structures in eight out of ten cases. With the exception of target C,<br />

for all successful targets a population of the five lowest energy structures converge to less than 3 Å<br />

, while the two failed targets do not improve beyond 10 Å [Table 4.1-g]. Convergence is a signal<br />

that the protocol has found a topology that reliably satisfies the combined score, which in the case<br />

of PCS-ROSETTA clearly identifies the failed models as unreliable, allowing for their rejection<br />

(Shen et al., 2008). In the case of target C large disordered termini prevent a clear identification of<br />

convergence, but convergence becomes apparent when only the core residues are considered (Table<br />

S4.2-g). Results with CS-ROSETTA and PCS-ROSETTA are compared in SI Figure S4.3.


108 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure 4.2 Improved conformational sampling by PCS-ROSETTA. 10000 independent low<br />

resolution trajectories were carried out with (black) or without (red) PCS information. The plots<br />

show the density of C rmsd values to the target structure after the fragment assembly step. The<br />

targets are labeled as in Table 4.1. Corresponding plots of structures calculated with full atom<br />

relaxation for positioning the amino acid side chains are shown in SI Figure S4.2. The library used<br />

for fragment selection explicitly excluded any protein with sequence similarity to the target protein.<br />

The figure shows that PCS scores efficiently guide fragment assembly towards the correct target<br />

structure.<br />

Figure 4.3 Energy landscapes generated by PCS-ROSETTA. Combined ROSETTA energy and PCS<br />

score (using the weighting factor w(c)) are plotted versus the C rmsd to the target structure for<br />

structures calculated using PCS-ROSETTA. The lowest energy structures are indicated in red. The<br />

targets are labeled as in Table 4.1. The results show that PCS-ROSETTA is likely to generate and<br />

identify the correct fold.


4.3 Results. 109<br />

Agreement of the structures with the experimental data can also be directly assessed by the<br />

quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ), where PCSi exp is the experimental PCS<br />

value for the nuclear spin i 7 . A quality factor above 25% indicates failure to find a correct structure<br />

and a quality factor below 20% indicates that the computed structure is in good agreement with the<br />

experimental PCSs (Table 4.1), as in other definitions of quality factors (Cornilescu et al., 1998).<br />

The low quality factor of the θ subunit (7%) establishes the success of the calculation despite the<br />

lack of clear convergence.<br />

4.3.4 Successes and Limits of PCS-ROSETTA Calculations<br />

The results of PCS-ROSETTA calculations are summarized in Table 4.1. The structures of<br />

small proteins (< 80 residues, targets A to F) are easily solved by PCS-ROSETTA: the lowest PCS-<br />

ROSETTA energy are consistently below 2.4 Å in C rmsd relative to the native structure and have<br />

quality factor below 16%. For these proteins, the generation of 10000 models was ample (Figure<br />

4.2 A to F). The same number of decoys calculated with CS-ROSETTA did not lead to satisfactory<br />

convergence for targets B-C-D-E-F (Table 4.1-g), though targets C and D partially recover if<br />

flexible termini are removed at the full atom rescoring step (SI Text S4.2). The tag used to<br />

paramagnetically label ArgN (D) produced Δχ-tensor axes of significantly different orientation with<br />

different lanthanides (Su et al., 2008) which may explain why the PCS-ROSETTA calculations<br />

performed particular well with these data.<br />

PCS-ROSETTA succeeded in calculating the structure of a protein with 146 residues and<br />

PCSs from multiple lanthanides (target I). More than 62% of calculated structures had a C RMSD<br />

below 5 Å, while only 6.2% met that criterion for CS-ROSETTA calculation (Figure 4.2 I). This<br />

indicates that the PCS data score will effectively guide the sampling towards the correct fold also<br />

for larger proteins. While calculations on target J (186 residues) did not converge despite a large<br />

PCS data set, this can be attributed to a sampling problem associated with large proteins of<br />

complex topology (Bradley et al., 2005) which may be overcome with a modified protocol.<br />

Importantly, the success of a calculation can be ascertained from calculating the quality factor Q.<br />

Combined with the convergence criterion (Shen et al., 2008), the quality factor is an effective way<br />

7 Rms stands for Root Mean Square. Not to be confused with Rmsd (Root Mean Square Deviation).


110 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

to assert the success of a calculation (SI Figure S4.4). For each of the eight targets for which the<br />

PCS-ROSETTA calculations converged, the structure with the lowest energy is shown<br />

superimposed with the native structure in Figure 4.4.<br />

Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />

structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures (red).<br />

The protein targets are (A) protein G, (B) calbindin, (C) the θ subunit of E. coli DNA polymerase<br />

III, (D) the N-terminal domain of the E. coli arginine repressor (ArgN; with covalent lanthanide<br />

tag), (E) ArgN with non-covalent lanthanide tag, (F) the N-terminal domain of calmodulin, (G)<br />

thioredoxin, (H) parvalbumin, (I) calmodulin and (J) the globular domain of the ε subunit of E. coli<br />

DNA polymerase III. Flexible termini were omitted as described in SI Table S4.1. Only the target<br />

structure is shown for parvalbumin (H) and the ε subunit (J), as the calculations could not<br />

reproduce the correct fold for these proteins.<br />

4.4 Discussion<br />

The structural information content of the PCS effect has long been recognized, but initial<br />

attempts to determine the 3D structures of biomolecules by the use of PCSs were hampered by the<br />

difficulty to determine -tensor and structure simultaneously (Barry et al., 1971). Subsequently,<br />

the first 3D structure determinations of proteins relied on nuclear Overhauser effect data (Wüthrich,<br />

1986). Full structure determination of proteins from PCS data alone continues to be regarded as<br />

difficult (Bertini et al., 2002a). Owing to its modeling capabilities, PCS-ROSETTA makes it


4.4 Discussion. 111<br />

possible, for the first time, to determine 3D structures using PCSs as the only restraints while<br />

simultaneously determining all Δχ-tensor parameters and integrating PCSs from different metal<br />

ions. In addition, a PCS quality factor can be calculated that is highly indicative of the correctness<br />

of the final structure. The effect of the PCSs on improving convergence of the calculations towards<br />

the correct target structures is particularly remarkable if one considers that PCS data mostly were<br />

available only for backbone amides.<br />

The success of PCS-ROSETTA is based on the fact that, in contrast to scoring functions<br />

using chemical shift data, the PCS score is much more sensitive to global than local structure.<br />

Therefore, PCS data can guide the search in the low resolution fragment assembly step, greatly<br />

increasing the yield of near-native structures compared to CS-ROSETTA. PCSs thus present an<br />

ideal complement to chemical shift information that is most important in the preceding fragment<br />

selection step. The improved convergence alleviates the need to compute large numbers of decoys.<br />

It would be possible to accelerate the computations further by using the PCS score to select decoys<br />

with low rmsd values to the target structure prior to the computationally expensive refinement of<br />

amino acid side chain conformations.<br />

Many protein specific factors, including fold complexity, number and quality of PCS data,<br />

and metal site play roles in the success of PCS-ROSETTA fragment assembly and their relative<br />

importance is difficult to disentangle. In general, PCS data from two or more lanthanides are<br />

expected to assist identification of decoys with low rmsd to the target structure. While the structure<br />

of calmodulin, a protein with 146 residues, was successfully determined by PCS-ROSETTA, the<br />

structure of ε186 (186 residues) was not found by the program despite the availability of many<br />

PCSs overall (Table 4.1). The scarcity of PCS values for residues near the lanthanide binding site<br />

may have contributed to this effect. As the PCS-ROSETTA protocol did not sample structures<br />

below 10 Å rmsd (Figure 4.3 J) and the PCS scores of native-ε186 like structures only show a<br />

funnel-like energy landscape below 10 Å rmsd (Figure 4.1 J), this could also be a case where<br />

structures explored by the basic ROSETTA sampling protocol do not form enough native features<br />

for the PCS score to discriminate between them. An alternative sampling protocol, such as broken<br />

chain sampling (Bradley et al., 2006) or iterative refinement (Qian et al., 2007), may be the key to<br />

accurately modeling the structure of ε186 using PCS data.<br />

The present calculations were performed with proteins containing single metal binding sites.<br />

Clearly, data from multiple metal ions using different metal binding sites will greatly enhance the<br />

information content of PCS data. In particular, lanthanide ions display very different paramagnetic<br />

properties while their chemical similarity allows all lanthanides to bind at a given lanthanide


112 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

binding site. Several metal binding tags have recently been developed to tag proteins site-<br />

specifically with a paramagnetic lanthanide; for a recent review, see (Su et al., 2009b). We note that<br />

PCSs were as useful for targets devoid of natural metal binding sites (targets A, C, D and E) as for<br />

metalloproteins (Figure 4.2).<br />

We propose a new approach to protein structure determination in which PCS data are<br />

collected from natural or engineered metal binding sites, and then used to guide ROSETTA<br />

conformational search along with backbone chemical shift data. The accuracy and reliability of the<br />

lowest energy models is assessed based on the convergence of the calculation and the PCS quality<br />

factor. With multiple independent lanthanide datasets and improved conformational search<br />

methods, the approach should be extendable to proteins greater than 150 amino acids.<br />

4.5 Materials and Methods<br />

4.5.1 PCS-ROSETTA Score.<br />

et al., 2002b)<br />

The PCS (in ppm) induced by a metal ion M on a nuclear spin can be calculated as (Bertini<br />

(4.2)<br />

where ri is the distance between the spin i and the paramagnetic centre M, xi, yi, zi are the<br />

Cartesian coordinates of the vector between the metal ion and the spin i in an arbitrary frame f and<br />

Δχxx, Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in the frame f (as Δχzz = -Δχxx -Δχyy,<br />

there are only five independent parameters). The Δχ-tensor components and the metal coordinates<br />

are initially unknown and must be redetermined each time the PCS score c is evaluated. c is<br />

calculated over all metal ions Mj as<br />

(4.3)<br />

where PCSi calc (Mj) and PCSi exp (Mj) are the calculated and experimental PCS values of spin<br />

i induced by the metal ion Mj, respectively. The determination of the Δχ-tensor components and the


4.5 Materials and Methods. 113<br />

metal coordinates presents a non-linear least square fitting problem. In order to avoid local minima<br />

and speed up the calculation, we split the problem into its linear and non-linear part. Equation (4.2)<br />

shows that PCSi calc is linear with respect to the five Δχ-tensor components. Using a three-<br />

dimensional grid search over the Cartesian coordinates xM, yM, zM of the paramagnetic centre,<br />

singular value decomposition optimizes the five Δχ-tensor parameters efficiently and without<br />

ambiguity for lowest residual score c at each node of the grid. The grid node with the lowest c score<br />

is then used as the starting point for optimization of the three metal coordinates along with the five<br />

Δχ-tensor components to reach the minimal cost c.<br />

The PCS score was added to the ROSETTA low resolution energy function using a different<br />

weighting factor w(c) for each structure calculation. w(c) was determined by first generating 1000<br />

decoys with ROSETTA and calculating w(c) as<br />

(4.4)<br />

where ahigh and alow are the average of the highest and lowest 10% of the values of the<br />

ROSETTA ab initio score, and chigh and clow are the average of the highest and lowest 10% of the<br />

values of the PCS score c upon rescoring each of the 1000 decoys with the PCS. The weights used<br />

for the ten structure calculations performed in the present work are given in SI Table S4.1.<br />

4.5.2 PCS-ROSETTA Algorithm<br />

PCS-ROSETTA uses the ROSETTA de novo structure prediction methodology to build low<br />

resolution models, followed by all atom refinement using the ROSETTA high resolution Monte<br />

Carlo minimization protocol. The additions to the standard ROSETTA structure prediction methods<br />

are: the use of chemical shifts to guide fragment selection as in CS-ROSETTA, the use of PCS data<br />

to guide the initial low resolution search and the use of PCS data for final model selection. A flow<br />

diagram of the computational protocol of PCS-ROSETTA is shown in SI Figure S4.5.<br />

4.5.3 Input for PCS-ROSETTA<br />

The chemical shifts of all protein targets were taken from the literature or from the<br />

BioMagResBank. CS-ROSETTA was used for fragment selection. CS-ROSETTA reports the<br />

difference between experimental and expected chemical shifts. Chemical shifts with very large<br />

deviations from expectations (often attributable to errors in the deposited data) were removed from<br />

the input. CS-ROSETTA also suggests corrections in the chemical shift referencing. We only


114 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

corrected 13 C chemical shifts, except for thioredoxin where 15 N chemical shift were corrected (SI<br />

Table S4.1). CS-ROSETTA aims to generate 200 9-residue fragments and 200 3-residue fragments<br />

centered on each residue of the polypeptide chain for use in the ab initio fragment assembly<br />

protocol of ROSETTA. In cases where CS-ROSETTA failed to generate 200 fragments, we<br />

generated additional fragments using the conventional ROSETTA protocol in order to make 200<br />

fragments available. For each of the target proteins, we removed any protein with recognizable<br />

sequence similarity (BLAST E-value below 0.05) from the CS-ROSETTA protein database. (For<br />

example, the structure of ε186 is present in the CS-ROSETTA database, but was explicitly<br />

excluded when fragments were generated.) In order to accelerate the grid search for the metal<br />

position, PCS-ROSETTA allows a precise description of the space to be searched, including the<br />

center of the grid search (cg), the step size between two nodes (sg), an outer cutoff radius (co) to<br />

limit the search to a minimal distance from cg, and an inner cutoff radius (ci) to avoid a search too<br />

close to cg. A moderately large step size (sg) was chosen to speed up computations during low<br />

resolution sampling (Table S4.1), and reduced to 25% of its value during the final high resolution<br />

scoring step to ensure maximum accuracy. For each target, the grid parameters cg, co and ci were<br />

chosen in accordance to prior knowledge about the approximate metal binding site. For example,<br />

for a covalent tag attached to the protein, we used the known geometric information of the tag to set<br />

cg, co, and ci, whereas for proteins with a natural metal binding site, a highly conserved negatively<br />

charged residue was picked as a reference point for cg. In the absence of prior biochemical<br />

information, the nuclear spin with the largest absolute PCS value was chosen as the center of the<br />

grid. SI Table S4.1 summarizes the grid parameters used for the different protein targets. In order to<br />

assess the impact of the initial grid parameters on the structures calculated, a set of PCS-ROSETTA<br />

calculations was performed for each target, where cg was centered at the nuclear spin of the largest<br />

PCS observed and the cutoff radius co was set to 15 Å. No change in the quality of the results was<br />

observed but in most cases the calculations took longer.<br />

4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination<br />

Chemical shifts of the proteins were prepared in Talos format (Cornilescu et al., 1999) and<br />

used by CS-ROSETTA for fragment selection. Chemical shift corrections, fragment selection, and<br />

determination of the weights w(c) were performed as described above. 10000 protein structures<br />

were computed with PCS-ROSETTA and subjected to the full atom relaxation protocol of<br />

ROSETTA to model the side chain conformations. The final structures were rescored using the<br />

ROSETTA full atom energy function combined with the PCS scores c, using the weighting factors<br />

w(c) (Equation (4.4)) with ahigh and alow calculated against the ROSETTA full atom energy, and


4.6 Acknowledgments. 115<br />

with a total weight multiplied by 2 to give a larger contribution to the PCS score than in the<br />

fragment assembly. The best scoring structures can be assessed by the PCS quality factor Q =<br />

rms(PCS cal – PCS exp ) / rms(PCS exp ). Computation of 10000 PCS-ROSETTA structures took on<br />

average 137 CPU days per target and was run on a local cluster. SI Figure S4.6 shows a posteriori<br />

that 1000 structures per targets would have been enough for convergence of the protocol.<br />

4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring<br />

3000 decoys with a wide range of rmsd values to the target structure were generated by<br />

including the native fragment and limiting the number of alternatives fragments in the fragment<br />

generation step of the ROSETTA calculations. 1000 decoys each were calculated using two, five<br />

and ten fragments per residue, respectively. The presence of the native fragments in a small pool of<br />

fragments ensured the generation of structures very similar to the target structure.<br />

4.6 Acknowledgments<br />

C.S. thanks the University of Queensland for a Graduate School <strong>Research</strong> Travel Grant to<br />

undertake this collaborative research project. T.H. thanks the Australian <strong>Research</strong> Council for a<br />

Future Fellowship. Financial support from the Australian <strong>Research</strong> Council for project grants to<br />

G.O. and T.H. is gratefully acknowledged. D.B. thanks the Howard Hughes Medical Institutes.<br />

4.7 References<br />

Arnesano F, Banci L and Piccioli M (2005) <strong>NMR</strong> structures of paramagnetic metalloproteins. Q.<br />

Rev. Biophys. 38:167-219<br />

Baig I, Bertini I, Del Bianco C, Gupta YK, Lee YM, Luchinat C and Quattrone A (2004)<br />

Paramagnetism-based refinement strategy for the solution structure of human α-<br />

parvalbumin. Biochemistry 43:5562-5573<br />

Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />

complete aminoacid identification in thulium substituted Calbindin D9K. J. Biomol. <strong>NMR</strong><br />

34:63-73


116 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />

restraints for Xplor-NIH. J. Biomol. <strong>NMR</strong> 28:249-261<br />

Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />

PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />

torsion angle molecular dynamics. J. Biomol. <strong>NMR</strong> 12:553-557<br />

Barry CD, North ACT, Glasel JA, Williams RJP and Xavier AV (1971) Quantitative determination<br />

of mononucleotide conformations in solution using lanthanide ion shift and broadening<br />

<strong>NMR</strong> probes. Nature 232:236-245<br />

Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />

Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />

domain reorientation in calmodulin. Proc. Natl. Acad. Sci. USA. 101:6841-6846<br />

Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />

Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />

Ca Ln calbindin D9k. J. Biomol. <strong>NMR</strong> 21:85-98<br />

Bertini I, Kursula P, Luchinat C, Parigi G, Vahokoski J, Wilmanns M and Yuan J (2009) Accurate<br />

solution structures of proteins from X-ray data and a minimal set of <strong>NMR</strong> data:<br />

Calmodulin-peptide complexes as examples. J. Am. Chem. Soc. 131:5134-5144<br />

Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002a) Efficiency of paramagnetism-<br />

based constraints to determine the spatial arrangement of α-helical secondary structure<br />

elements. J. Biomol. <strong>NMR</strong> 22:123-136<br />

Bertini I, Luchinat C and Parigi G (2002b) Magnetic susceptibility in paramagnetic <strong>NMR</strong>. Prog.<br />

<strong>NMR</strong> Spectr. 40:249-273<br />

Bradley P and Baker D (2006) Improved beta-protein structure prediction by multilevel<br />

optimization of NonLocal strand pairings and local backbone conformation. Proteins<br />

65:922-929<br />

Bradley P, Misura KMS and Baker D (2005) Toward high-resolution de novo structure prediction<br />

for small proteins. Science 309:1868-1871<br />

Chandrasekhar K, Krause G, Holmgren A and Dyson HJ (1991) Assignment of the 15 N <strong>NMR</strong><br />

spectra of reduced and oxidized Escherichia Coli thioredoxin. FEBS Lett. 284:178-183<br />

Cornilescu G, Delaglio F and Bax A (1999) Protein backbone angle restraints from searching a<br />

database for chemical shift and sequence homology. J. Biomol. <strong>NMR</strong> 13:289-302<br />

Cornilescu G, Marquardt JL, Ottiger M and Bax A (1998) Validation of protein structure from<br />

anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc.<br />

120:6836-6837


4.7 References. 117<br />

DeRose EF, Li DW, Darden T, Harvey S, Perrino FW, Schaaper RM and London RE (2002) Model<br />

for the catalytic domain of the proofreading epsilon subunit of Escherichia coli DNA<br />

polymerase III based on <strong>NMR</strong> structural data. Biochemistry 41:94-110<br />

Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />

of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J.<br />

Biomol. <strong>NMR</strong> 28:205-212<br />

Jensen MR and Led JJ (2006) Metal-protein interactions: Structure information from Ni 2+ -induced<br />

pseudocontact shifts in a native nonmetalloprotein. Biochemistry 45:8782-8787<br />

John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />

ligand complexes by transferred paramagnetic shifts. J. Am. Chem. Soc. 128:12910-12916<br />

Lemaster DM and Richards FM (1988) <strong>NMR</strong> sequential assignment of Escherichia Coli<br />

thioredoxin utilizing random fractional deuteriation. Biochemistry 27:142-150<br />

Mueller GA, Kirby TW, DeRose EF, Li D, Schaaper RM and London RE (2005) Nuclear magnetic<br />

resonance solution structure of the Escherichia coli DNA polymerase III θ subunit. J.<br />

Bacteriol. 187:7081-7089<br />

Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />

complexes by lanthanide labeling. Acc. Chem. Res. 40:206-212<br />

Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />

<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J. Am. Chem.<br />

Soc. 128:3696-3702<br />

Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ and Baker D (2007) High-resolution<br />

structure prediction and the crystallographic phase problem. Nature 450:259-264<br />

Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange<br />

O, Kinch L, Sheffler W, Kim BH, Das R, Grishin NV and Baker D (2009) Structure<br />

prediction for CASP8 with all-atom refinement using Rosetta. Proteins online:<br />

Saio T, Ogura K, Yokochi M, Kobashigawa Y and Inagaki F (2009) Two-point anchoring of a<br />

lanthanide-binding peptide to a target protein enhances the paramagnetic anisotropic effect.<br />

J. Biomol. <strong>NMR</strong> 44:157-166<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J. Biomol. <strong>NMR</strong> 35:79-<br />

87<br />

Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />

software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J.<br />

Biomol. <strong>NMR</strong> 41:179-189


118 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />

Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />

A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc.<br />

Natl. Acad. Sci. USA. 105:4685-4690<br />

Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />

from fragments with similar local sequences using simulated annealing and bayesian<br />

scoring functions. J. Mol. Biol. 268:209-225<br />

Su XC, Liang H, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- Is a convenient paramagnetic shift<br />

reagent for protein <strong>NMR</strong> studies. J. Am. Chem. Soc. 131:10352-10353<br />

Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />

(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />

<strong>NMR</strong> spectroscopy. J. Am. Chem. Soc. 130:10486-10487<br />

Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J. Biomol.<br />

<strong>NMR</strong> in press:<br />

Ubbink M, Ejdebäck M, Karlsson BG and Bendall DS (1998) The structure of the complex of<br />

plastocyanin and cytochrome f, determined by paramagnetic <strong>NMR</strong> and restrained rigid-<br />

body molecular dynamics. Structure 6:323-335<br />

Wilton DJ, Tunnicliffe RB, Kamatari YO, Akasaka K and Williamson MP (2008) Pressure-induced<br />

changes in the solution structure of the GB1 domain of protein G. Proteins 71:1432-1440<br />

Wüthrich K (1986). <strong>NMR</strong> of proteins and nucleic acids. Wiley, New York.<br />

Zhuang T, Lee HS, Imperiali B and Prestegard JH (2008) Structure determination of a Galectin-3-<br />

carbohydrate complex using paramagnetism-based <strong>NMR</strong> constraints. Protein Sci. 17:1220-<br />

1231<br />

4.8 Supporting information


4.8 Supporting information. 119<br />

Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy. 3000 decoys<br />

were generated using CS-ROSETTA. In order to ensure that some decoys with small rmsd to the<br />

target structure were obtained, the starting set of peptide fragments was reduced and included the<br />

fragments from the known target structures. A to J: ROSETTA energies plotted versus the C rmsd<br />

to the target structure. A’ to J’: PCS scores plotted versus the C rmsd to the target structure. The<br />

targets are labeled A-J as in Table 4.1.


120 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.2 Improved fragment assembly by PCS-ROSETTA. Fragments were assembled in 10000<br />

different runs of CS-ROSETTA (red), 10000 different runs of PCS-ROSETTA (black), and 10000


4.8 Supporting information. 121<br />

different runs using exclusively the PCS score of PCS-ROSETTA (blue). The plots show the<br />

frequency with which structures of different C rmsd values to the target structure were found. The<br />

red and black solid lines reproduce the data of Figure 4.2. The dashed lines show the<br />

corresponding data obtained in independent calculations that included the full atom refinement<br />

step. The same colors were used for calculations with and without the full atom refinement step.<br />

The full atom refinement step does not significantly change the C rmsd of the structures produced<br />

in the fragment assembly step with respect to the target structure. The targets are labeled A-J as in<br />

Table 4.1.<br />

Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />

ROSETTA energies and C α rmsd values being calculated using only the core residues as defined in<br />

Table S4.1. A to J: full atom ROSETTA energies plotted versus the C α rmsd to the target structure<br />

for structures calculated using CS-ROSETTA. A’ to J’: Combined ROSETTA energy and PCS score<br />

plotted versus the C rmsd to the target structure for structures calculated using PCS-ROSETTA.<br />

The lowest energy structures are indicated in red. The targets are labeled A-J as in Table 4.1.


122 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.4 Identification of successful calculations with PCS-ROSETTA. The quality factor Q<br />

reports on the agreement between the experimental and calculated PCS. A value below 20%<br />

usually indicates that the calculated structure satisfy the PCS restraint. Above 25%, the quality of<br />

the structure is poor. On the y axis are the average C α rmsd calculated between the lowest scored<br />

structure and the next four lowest scoring structures. Rmsd below 3 Å are indicative of the<br />

convergence of the protocol. Convergence criterion and quality factor can be combined to further<br />

ascertain the success of the calculations for the targets A-B-C-D-E-F-I-G., and reject targets H and<br />

J. The targets are labeled A-J as in Table 4.1. The values are those of Table 4.1-h for the x axis,<br />

and Table 4.1-g for the y axis.


4.8 Supporting information. 123<br />

Figure S4.5 Flow diagram of PCS-ROSETTA. (a) Fragments are selected by their chemical shifts<br />

using CS-ROSETTA. (b) The PCS weight is calculated using equation (4.4) on 1000 decoys<br />

generated with CS-ROSETTA. (c) Structures are produced by the classical fragment assembly of<br />

ROSETTA with addition of the PCS-score. (d) Side chains are added to the structures and<br />

subjected to a full atom minimization. (e) Resulting structures are rescored using a combination of<br />

the ROSETTA full atom energy score and the PCS score. (f) Best structures are selected by their<br />

lowest score.


124 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA. A<br />

given number n of structures (x axis) was randomly chosen 5000 times from the total of 10 000<br />

generated structures and the averaged C α rmsd of the lowest energy (over the 5000 trials) is<br />

graphed. The curves show a posteriori that 1000 structures calculated for all the targets would<br />

have been ample to ensure convergence of PCS-ROSETTA calculations. The targets are labeled A-<br />

J as in Table 4.1. The curve for the target parvalbumin (H) and ε (J) are not shown.


Table S4.1 PCS data information and grid search parameters used.<br />

4.8 Supporting information. 125<br />

Protein name Residues a Metal ions used Atom types cs corr b w(c) cg c sg d co d ci d<br />

protein G (A) 1-56 Tb 3+ Tm 3+ Er 3+ H N Ce<br />

0.53 15.5 E19 CA 6 17 7<br />

calbindin (B) 2-75<br />

3+ Dy 3+ Er 3+<br />

Eu 3+ Ho 3+ Nd 3+<br />

Pr 3+ Sm 3+ Tb 3+<br />

Tm 3+ Yb 3+<br />

H N , N, C’ 2.72 48.9 D54 CA 6 8 4<br />

θ subunit (C) 10-64 Dy 3+ Er 3+ H N -0.16 7.1 D14 CA 6 25 15<br />

ArgN (D) 8-70 Tb 3+ Tm 3+ Yb 3+ H N , N 2.09 13.5 C68 CB 6 10 4<br />

ArgN (E) 8-70 Tb 3+ Tm 3+ H N 2.09 48.9 K12 CB 6 15 0<br />

N-calmodulin (F) 3-79 Tb 3+ Tm 3+ H N , CA, CB 0.00 4.7 D60 CA 6 8 4<br />

thioredoxin (G) 2-108 Ni 2+ H N 1.23 106.3 S1 N 3.8 4 0<br />

parvalbumin (H) 2-109 Dy 3+ H N , N 2.65 2.86 D93 CA 6 8 4<br />

calmodulin (I) 3-146<br />

Tb 3+ Tm 3+ Yb 3+<br />

Dy 3+ H N 0.59 5.1 D60 CA 6 8 4<br />

ε186 (J) 7-180 Tb 3+ Dy 3+ Er 3+ H N , N, C’ 0.53 8.2 D12 CA 6 8 4<br />

a Ordered residues<br />

b Uniform offset used for 13 C chemical shifts (in ppm) compared to published values. In the case of<br />

thioredoxin, the offset was applied to 15 N chemical shifts<br />

c Residue and atom name defining the center of the grid search to position the paramagnetic center.<br />

d In Ångstrom


126 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />

Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA.<br />

Targets PCS-ROSETTA run a CS-ROSETTA run b<br />

rmsd c convergence d rmsd c convergence d<br />

protein G (A) 0.61 0.92 0.80 0.88<br />

calbindin (B) 1.46 2.09 4.96 4.72<br />

θ subunit (C) 1.30 0.55 1.56 2.25<br />

ArgN e (D) 1.00 0.77 1.31 2.21<br />

ArgN f (E) 0.83 0.94 1.65 5.43<br />

N-calmodulin (F) 1.74 1.49 4.69 4.49<br />

thioredoxin (G) 2.58 2.44 4.61 5.55<br />

parvalbumin (H) 11.26 10.25 11.80 11.30<br />

calmodulin (I) 2.80 2.12 6.35 2.94<br />

ε186 g (J) 20.57 18.03 17.07 17.74<br />

a The structures used to calculate the rmsds were identified using the combined PCS-score and<br />

ROSETTA full atom energy across only the core residues defined in SI Table S4.1.<br />

b The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy<br />

across only the core residues defined in SI Table S4.1.<br />

c C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />

values were calculated using the core residues defined in SI Table S4.1.<br />

d Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />

structure, in Å. The rmsd values were calculated using the core residues defined in SI Table S4.1.<br />

e PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />

repressor (ArgN).<br />

f PCSs measured with a non-covalent tag bound to ArgN.<br />

g N-terminal 186 residues of the ε subunit of the E. coli polymerase III.<br />

Text S4.1 Fragment Assembly Using PCSs Only.<br />

In order to gain a better understanding of the merit of PCS data, we generated 10000 decoys<br />

per protein with all ROSETTA force field components turned off except for the PCS score. In<br />

seven of the ten protein structure calculations, the PCS score alone produced decoys with a C rmsd<br />

of less than 2.5 Å to the target structure (Figure S4.2, solid blue line). Control calculations without


4.8 Supporting information. 127<br />

any scoring function produced not a single useful decoy. This highlights the power of PCS data to<br />

define the overall topology of a protein at the fragment assembly stage. The effect was particularly<br />

pronounced for the target proteins θ and ArgN (Figure S4.2 C and D).<br />

The second set of PCS data of ArgN (Table 4.1; structure E) yielded worse decoys in the<br />

PCS-only computations with PCS-ROSETTA than CS-ROSETTA. Remarkably, however, using<br />

the PCS score in combination with the ROSETTA force field yielded much better structures than<br />

when used separately (Figure S4.3 E). This shows that the PCS score adds information that is not<br />

captured by the ROSETTA energy score alone.<br />

Text S4.2 Scoring over Core Residues.<br />

Disordered residues can add noise to the ROSETTA energy, and this noise can prevent<br />

identification of low rmsd structures. Notably, three of the targets that succeeded under the PCS-<br />

ROSETTA protocol and failed under the CS-ROSETTA protocol have disordered termini<br />

accounting for ten or more residues each (Table S4.2-d: Targets C, D & E). In practice it is possible<br />

to experimentally determine the disordered character of a residue, so to compare the effect of<br />

disorder on the two protocols we produced an additional set of structures by removing disordered<br />

residues during the final rescoring step (cores defined in Table S4.1). When the core residues are<br />

perfectly defined, in this case by observation of the solved structures, the CS-ROSETTA protocol<br />

identifies low rmsd structures in four of the ten cases (including C, D & E), and shows convergence<br />

to a low rmsd structure in three of the ten cases [Table S4.2-c, d, CS-ROSETTA run]. In contrast,<br />

removing the disordered residues has little effect on PCS-ROSETTA’s rmsd values, suggesting that<br />

the combined PCS and ROSETTA score is less sensitive to disorder. The remaining targets had<br />

little disorder and removal of disordered terminal residues had little effect on the results.


Chapter 5<br />

Conclusion and perspectives<br />

5. Conclusion and perspectives


130 Chapter 5. Conclusion and perspectives.<br />

5.1 The use of PCS for structure determination<br />

Structure determination of proteins remains a major challenge of the post genomic era.<br />

Conventional techniques such as X-ray crystallography and <strong>NMR</strong> spectroscopy are slow. Hence,<br />

the gap between known protein sequences and known protein structures remains large. Alternative<br />

experimental methods are required to speed up the process. Those methods have to present an<br />

attractive compromise between the efforts required to measure the desired data, and the merit that<br />

data can bring in assisting de novo determination of proteins.<br />

In Chapter 4, it has been demonstrated that PCS data are a potential candidate. It has been<br />

shown that combining a molecular fragment approach with the PCS score leads to the correct<br />

folding of proteins smaller than 146 residues. A benchmark of ten data sets has been compiled for<br />

this work, and the PCS-ROSETTA approach showed success in eight out of the ten cases. The first<br />

case where PCS did not lead to the correct folding concerned a protein having only one data set of<br />

PCS. The importance to measure multiple data sets with different lanthanides has already been<br />

shown in Chapter 2 (to increase the quality of the resonance assignment) and Chapter 3 (to increase<br />

the quality of the fitted Δχ-tensor). It was not surprising that PCS-ROSETTA had difficulties when<br />

working with a single data set. The second case where the fold was incorrect was for the largest<br />

protein of the benchmark: the subunit ε, 186 residues. The size of the protein might be a limiting<br />

factor for the current approach. It is well known that for proteins larger than 150 residues, the size<br />

of the conformational space explodes in the molecular fragment replacement protocol of<br />

ROSETTA. The question whether PCS-ROSETTA is facing the same problem is legitimate. Some<br />

elements of responses to that question are presented in the following sections.<br />

5.1.1 Folding of proteins using only pseudocontact shifts<br />

In order to gain a better understanding of the merit of the PCS, structure calculations with<br />

PCS-ROSETTA have been made with all energy terms switched off, except the PCS score. The<br />

fragment assembly within ROSETTA is hence guided purely by the PCS. While turning off the<br />

normal force field of ROSETTA presents no practical interest, the theoretical results presented in<br />

the following explanations remain interesting.<br />

The protocol generated and identified (by the lowest score) attractive folding for four out of<br />

the ten structure calculations (Figure 5.1). The term ―attractive folding‖ has to be defined in that<br />

context. Obviously, without energy terms such as van der Waals terms or hydrogen bonding, the<br />

resulting structures can exhibit steric clashes (Figure 5.1 a, c, d), or poor β-sheet pairing (Figure 5.1


5.1 The use of PCS for structure determination. 131<br />

a and c). However, the C α rmsd against the native structure was found to be reasonably low (2.25 Å<br />

for protein G, 1.89 Å for θ, 2.39 Å for ArgN, 5.03 Å for calmodulin). This provides proof of<br />

principle that the PCS alone can direct the correct folding of a protein at the fragment assembly<br />

level. The conditions of success remain unclear and should be further analyzed. It could be a<br />

combination of the complexity or size of the protein, the number and quality of data sets, the<br />

relative orientation of each Δχ-tensor, and the location of the paramagnetic center.<br />

Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the<br />

protein. The lowest PCS energy structure (blue) is superimposed onto the<br />

native structure (white). (a) protein G, (b) θ subunit, (c) ArgN, (d) calmodulin.<br />

Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide<br />

fragments in the protein structure. (a) Three PCS of the spin (black) measured with three


132 Chapter 5. Conclusion and perspectives.<br />

different lanthanides can be depicted as three isosurfaces (red, blue and yellow) where the<br />

spin must be located. In order to fulfill all three PCS data simultaneously, the spin must be<br />

located at the intersection of the three isosurfaces, helping to define the orientation and<br />

position of the fragment (purple) in the protein structure (white). (b) The same principle<br />

holds, if PCS from different lanthanide binding sites are available, in which case the<br />

intersection of the three isosurfaces is even better defined.<br />

A direct consequence of those results is the theoretical demonstration that it is possible to<br />

apply the PCS score in a ―divide and conquer‖ approach in order to overcome the sampling<br />

problem typically encountered with proteins larger than 120 residues. If a large protein was<br />

composed of the four proteins present in Figure 5.1, a protocol to obtain the folding of the large<br />

protein could be (i) to cut the protein in four pieces (Figure 5.1 a-b-c-d), (ii) to reassemble the<br />

pieces separately, and (iii) to reconstitute the whole proteins by superimposition of the four<br />

separately determined (sets of) Δχ-tensors. The proof of principle holds as no energy term (that<br />

would report on interactions between the four parts of the proteins) is used. A more efficient way<br />

however, could be to work with smaller overlapping fragments. The size of the fragments would<br />

need to be large enough to make it possible to optimize the Δχ-tensor (larger than 20 residues), but<br />

not too small to prevent running into the sampling problem (smaller than 80 residues).<br />

Clearly, the PCS-score would benefit favorably from additional energy terms, starting from<br />

the van der Waals term that would strongly penalize conformations with steric clashes, thus<br />

favoring the sampling of the correct fold. In particular, the symmetric shape of the isosurfaces<br />

implies the existence of symmetrical folds that could be discriminated with the help of the van der<br />

Waals term, as theoretically demonstrated in (Bertini et al., 2002).<br />

5.1.2 Uses of multiple lanthanide binding sites<br />

Erstwhile limited to metalloproteins, paramagnetic <strong>NMR</strong> is increasingly enjoying a wider<br />

playground since the arrival of lanthanide binding tags. These tags can be attached to the protein<br />

via a disulfide bond or at the termini of the protein. A non-covalent tag can also be used, although<br />

the disadvantage is the loss of control over whether (and where) the tags bind. Attaching metal tags<br />

to a protein of interest site-specifically is one of the current challenges of paramagnetic <strong>NMR</strong>.<br />

Several groups are developing new tags, for a recent review, see (Su et al., 2009). While those tags<br />

are engineered to simplify as much as possible the process of tag attachment, a consequence of this


5.1 The use of PCS for structure determination. 133<br />

quest will hopefully be the possibility to attach them at different positions on the surface of the<br />

target protein.<br />

All work outlined in this thesis has greatly benefited from the availability of multiple data<br />

sets measured with different lanthanides. The magnitude of the paramagnetic dipole moment differs<br />

between different paramagnetic metal ions. The orientation of the Δχ-tensor varies too. The<br />

advantage is that those differences provide additional information that can be used to improve the<br />

quality of the fitted Δχ-tensor (Chapter 3) or the quality of the automated assignment (Chapter 2).<br />

PCS-ROSETTA calculations can take advantage of that fact. Especially the calculations done on<br />

calbindin greatly improved when all available lanthanide data were used compared to test<br />

calculations using PCS from only one or two lanthanides simultaneously. It can be expected that<br />

the use of tags attached at different locations on the protein will enhance the PCS-ROSETTA<br />

calculations further. In particularly, the location and the orientation of fragments with respect to the<br />

rest of the protein structure would be defined more accurately by isosurfaces intersecting at steeper<br />

angles (Figure 5.2). The current implementation of PCS-ROSETTA would make it straightforward<br />

to design such a protocol. The benefits of using two different lanthanide binding sites could be<br />

explored already using the arginine repressor as a test case, were PCS data measured for two<br />

different lanthanide binding sites are available.<br />

5.1.3 Development of a new PCS-ROSETTA protocol<br />

The way structures are calculated by PCS-ROSETTA is similar to the standard protocol of<br />

ROSETTA. The only difference is the calculation of a PCS-score during the fragment assembly<br />

stage (which requires fitting of a Δχ-tensor and metal position). The weight of the PCS-score<br />

compared relative to the standard centroid score of ROSETTA is chosen so that both have an equal<br />

influence. While ROSETTA benefits from additional experimental restraints such as PCSs, it is<br />

important to bear in mind that the original goal of ROSETTA is to generate a wide variety of<br />

protein-like structures in a first step (fragment assembly) and identify the native one in a second<br />

step (full atom score). Considering that the PCS have proven to drive the sampling towards the<br />

native structure with great efficiency, it can be questioned whether it is necessary to enforce<br />

diversity in the generated structures. Several thousands of structures (using a large amount of CPU<br />

time) are usually generated by ROSETTA to cover a wide range of possible structures. It may be<br />

more profitable to generate, at equal CPU time, a smaller number of structures for which more time<br />

is spent for the fragment assembly.


134 Chapter 5. Conclusion and perspectives.<br />

Clearly, a larger benchmark containing proteins of different topology and size is necessary<br />

to gain a better understanding of the merit of the pseudocontact shift. We are in the process of<br />

creating an artificial one, since PCS can easily be predicted. The parameters that would impact the<br />

success of calculation, and that we have to explore are: the level of noise within the data PCS-<br />

ROSETTA can tolerate, the influence of the position of the paramagnetic center relative to the<br />

protein, and the influence of the relative orientations of different Δχ-tensors.<br />

5.2 The use of PCS for chemical shift assignment<br />

The work of Chapter 2 provides proof of principle that PCS can be used for automatic<br />

chemical shift assignment when a 3D structure is available. Chemical shifts are sensitive to the<br />

local environment. Small variations in the surrounding electronic configuration can have a large<br />

impact on the chemical shift values. This makes it extremely difficult to predict chemical shifts<br />

accurately. In contrast, the PCS of a spin i is much less affected by similar variations, as the PCS<br />

only depends on the spherical coordinates of i with respect to the Δχ-tensor frame.<br />

While the program Possum presented in Chapter 2 is limited to methyl groups, the approach<br />

used to automatically assign the chemical shifts could be applied to any atom type for which PCS<br />

can be measured. The simulated annealing protocol used to solve the multi-dimensional assignment<br />

problem has proven to be highly efficient at sampling the possible assignment space and find a<br />

solution of lower energy than manual assignments. The scoring scheme used to optimize the<br />

assignment is also efficient; only few misassignments were present when multiple lanthanide data<br />

sets were available. Even more interesting may be to obtain the Δχ-tensor parameters directly from<br />

unassigned chemical shifts. The program Echidna (Schmitz et al., 2006) is capable of<br />

simultaneously getting the Δχ-tensor parameters and assigning the paramagnetic <strong>NMR</strong> spectrum,<br />

provided that the diamagnetic spectrum is already assigned. This raises the question whether both<br />

the diamagnetic and the paramagnetic spectrum can be assigned, while determining the Δχ-tensor at<br />

the same time. A software package for this purpose is currently under development. The idea is to<br />

handle various kinds of input information (partial assignment for some residues in the diamagnetic<br />

or paramagnetic state, measurement of some unassigned pseudocontact shifts, selective isotope<br />

labeling of some amino acids) and use this partial information in a simulated annealing protocol to<br />

optimize the assignment while determining the Δχ-tensor parameters. The challenge of this<br />

approach is to use the right combination of methods: simulated annealing for the assignment, grid


5.3 The use of PCS for protein docking. 135<br />

search for the coordinates of the paramagnetic center, and singular value decomposition for the<br />

determination of the remaining Δχ-tensor parameters.<br />

5.3 The use of PCS for protein docking<br />

At present, structural biology groups are investing much effort in producing models of<br />

proteins by <strong>NMR</strong> or X-ray crystallography. Currently more than 48000 crystal structures and<br />

almost 7000 <strong>NMR</strong> structures of proteins have been deposited in the protein data bank. In contrast,<br />

the number of protein-protein complexes solved by any of those methods remains low (less than<br />

2500). For X-ray crystallography, the difficulty to co-crystallize a complex is much greater than to<br />

obtain crystal structures of the individual proteins. For <strong>NMR</strong> spectroscopy, the larger molecular<br />

weight of complexes presents a problem, making it more difficult to obtain and analyze data.<br />

Additionally, the most useful information of intermolecular NOEs often involves amino acid side<br />

chains the resonances of which are much harder to assign than the backbone resonances of the<br />

protein.<br />

For a better insight of the molecular basis of life, the challenge is to understand how<br />

individual macromolecules come together to fulfill their tasks in DNA replication, gene expression<br />

and regulation, etc. Construction of models for protein-protein, protein-DNA and protein-ligand<br />

complexes are necessary. An angle to tackle the problem is the use of a docking program that<br />

predicts the binding mode of the complex given the structures of the individual components. The<br />

major difficulty of this approach is to comprehensively explore the 6-dimensional space that<br />

describes the relative orientation and position of two rigid bodies in space. This presents a<br />

challenging sampling problem. Shortage of experimental information to support any model<br />

generated is another drawback of the docking approach. Therefore, alternative techniques to<br />

provide more experimental information would be important. Measurements of residual dipolar<br />

couplings are an efficient way to obtain orientational information between two macromolecules: the<br />

macromolecules are weakly aligned using an alignment media. Independent determination of the<br />

alignment tensor with respect to the two macromolecules gives direct access to the relative<br />

orientation of the two rigid bodies. More interestingly, PCS measurements provide, in addition to<br />

orientational information, information on the distance between the two bodies. Determination of the<br />

Δχ-tensor with respect to the two molecules and superimposition of the two Δχ-tensor frames yields<br />

the relative orientation and position of the two macromolecules, as illustrated in Figure 1.8


136 Chapter 5. Conclusion and perspectives.<br />

(Chapter 1). Pseudocontact shifts therefore present particularly powerful experimental data to<br />

construct a model of the complex between two molecules by rigid body docking, and to shortcut<br />

the computationally expensive task of sampling the conformational space. PCS cannot, however,<br />

provide the high resolution accuracy that crystallography achieves and more detailed models will<br />

still require structural refinement software that optimizes the intermolecular packing and any other<br />

structural adjustments. At least, however, the computational effort of protein docking can focus on<br />

the atomic details when starting from a valid rigid body model.<br />

Another challenge facing the modeling of a protein-protein complex arises when one of the<br />

proteins undergoes a large conformational change. Protein docking software can tackle the problem<br />

to some extent by adapting the conformations of the side chains and the backbone close to the<br />

complex interface. As this approach greatly increases the conformational space to search, it is<br />

computationally challenging. Paramagnetic <strong>NMR</strong> and PCS could assist the modeling of large<br />

conformational changes by directing the conformational alterations toward the correct<br />

conformation. A protein that undergoes a large conformational change due to motions about a<br />

hinge that separates two rigid domains would present an attractive example. In a situation, where<br />

such a hinge motion occurs as a result of association with a binding partner, a technique for rigid<br />

body docking with the help of PCS as presented in this thesis could be particularly fruitfully<br />

applied to assist the modeling of the complex.<br />

5.4 References<br />

Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002) Efficiency of paramagnetism-<br />

based constraints to determine the spatial arrangement of α-helical secondary structure<br />

elements. J Biomol <strong>NMR</strong> 22:123-136<br />

Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />

tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />

87<br />

Su XC and Otting G (2009) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />

<strong>NMR</strong> in press

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!