Thesis Title: Subtitle - NMR Spectroscopy Research Group
Thesis Title: Subtitle - NMR Spectroscopy Research Group
Thesis Title: Subtitle - NMR Spectroscopy Research Group
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Computational study of proteins with paramagnetic <strong>NMR</strong>: Automatic<br />
assignments of spectral resonances, determination of protein-protein and<br />
protein-ligand complexes, and structure determination of proteins<br />
Christophe Schmitz<br />
A thesis submitted for the degree of Doctor of Philosophy at<br />
The University of Queensland in December 2009<br />
School of Chemistry and Molecular Biosciences
ii<br />
Declaration by author<br />
This thesis is composed of my original work, and contains no material previously published<br />
or written by another person except where due reference has been made in the text. I have clearly<br />
stated the contribution by others to jointly-authored works that I have included in my thesis.<br />
I have clearly stated the contribution of others to my thesis as a whole, including statistical<br />
assistance, survey design, data analysis, significant technical procedures, professional editorial<br />
advice, and any other original research work used or reported in my thesis. The content of my<br />
thesis is the result of work I have carried out since the commencement of my research higher<br />
degree candidature and does not include a substantial part of work that has been submitted to<br />
qualify for the award of any other degree or diploma in any university or other tertiary institution. I<br />
have clearly stated which parts of my thesis, if any, have been submitted to qualify for another<br />
award.<br />
I acknowledge that an electronic copy of my thesis must be lodged with the University<br />
Library and, subject to the General Award Rules of The University of Queensland, immediately<br />
made available for research and study in accordance with the Copyright Act 1968.<br />
I acknowledge that copyright of all material contained in my thesis resides with the<br />
copyright holder(s) of that material.<br />
Statement of Contributions to Jointly Authored Works Contained<br />
in the <strong>Thesis</strong><br />
John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />
stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />
Soc 129:13749-13757.<br />
John designed new <strong>NMR</strong> experiments, recorded and assigned the spectra, and wrote the<br />
corresponding paragraphs in the paper. Schmitz designed and implemented the software to<br />
automate the assignment procedure, ran the calculations, and wrote the paragraphs ―The Program<br />
Possum‖ and the ―Automatic Assignments without EXSY Data‖. Park made the protein samples.<br />
Dixon coordinated the protein sample preparation and corrected aspects of the paper. Huber was<br />
responsible for the computational aspects of the project and the writing of corresponding sections<br />
of the paper, Otting coordinated the overall project and was responsible for the writing of the paper.
iii<br />
Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />
software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />
Biomol <strong>NMR</strong> 41:179-189<br />
Schmitz designed and implemented the software to automate the determination of the Δχ-<br />
tensor and wrote most of the paper except for the ―study case‖ section. Stanton-Cook was<br />
responsible of the calculation, the protein-protein modelling, the writing of the ―study case‖ and<br />
improvements of the paper. Su was responsible for improving the design of the software from an<br />
―end-user‖ perspective. Otting and Huber were responsible for the overall project and the writing of<br />
the manuscript.<br />
Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />
pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted.<br />
Schmitz designed and implemented the PCS-score into the software ROSETTA, collected<br />
experimental data sets, performed computations, and was responsible for writing the manuscript.<br />
Vernon guided the implementation of the PCS-score, ran calculations, interpreted the results, and<br />
improved the manuscript. Otting gathered experimental data sets, set up the overall project, and<br />
corrected versions of the manuscript. Baker was responsible for guiding the overall project, and for<br />
the overall manuscript. Huber designed the PCS-score, guided the overall project, and improved the<br />
paper.<br />
Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole<br />
No contributions by others.<br />
Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the<br />
Award of Another Degree<br />
None.<br />
Published Works by the Author Incorporated into the <strong>Thesis</strong><br />
John M, Schmitz C, Park AY, Dixon NE, Huber T and Otting G (2007) Sequence-specific and<br />
stereospecific assignment of methyl groups using paramagnetic lanthanides. J Am Chem<br />
Soc 129:13749-13757. Incorporated as Chapter 2.
iv<br />
Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />
software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J<br />
Biomol <strong>NMR</strong> 41:179-189. Incorporated as Chapter 3.<br />
Schmitz C, Vernon R, Otting G, Baker D and Huber T Protein structure determination from<br />
pseudocontact shifts using ROSETTA. Proc Natl Acad Sci U S A submitted. Incorporated<br />
as Chapter 4.<br />
Additional Published Works by the Author Relevant to the <strong>Thesis</strong><br />
but not Forming Part of it<br />
Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />
(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />
<strong>NMR</strong> spectroscopy. J Am Chem Soc 130:10486-10487
v<br />
Acknowledgements<br />
I would like to thank my advisors Dr Thomas and Prof. Gottfried for their scientific and<br />
moral support that made this thesis so enjoyable. In particular, I really appreciated their door<br />
constantly opened for discussions; their communicative scientific enthusiasm; their exemplary - if<br />
not legendary - efficiency; their honesty and encouragement in whatever I tried to accomplish, and<br />
of course their constant and reliable good humor.<br />
Thanks to the people who put me on track to postgraduate studies, in particular Denis<br />
Barthou for his amazing teaching, Philippe Pucheral and Luc Bouganim for introducing me to<br />
research, and of course Prof. Guido for somehow bringing me into the field of structural biology.<br />
Thanks to the past and present members / visitors of the BMMG / MD group for those 3.5<br />
years of fun, I appreciated their company whether it was for a discussion, ―une boue‖, a tea break, a<br />
beer or two, or three, a meal, a game of tennis / squash, or a dance, so thanks to Matt, Itamar,<br />
David, Mitchel, Ying, Zrinka, Daniela, Michael, Kim, Liz, Alpesh, Prof. Alan Mark and all the<br />
others.<br />
It has always been a pleasure to visit ANU in Canberra thanks to Michael, Xun-Cheng,<br />
Hiromasa, Kiyoshi, Karin and Laura.<br />
I also would like to thank Prof. David Baker and his lab for welcoming me for a couple of<br />
months for a fructiferous collaboration, and many many thanks to Robert for so much help with<br />
that project, and for being a great illustration of how friendly Canadian people are.<br />
My apologizes to my family for being away so far for so long, I know you understood my<br />
decision. Thanks for all your support. Thank you Chantel for your patience, love, support and<br />
patience.
vi<br />
Abstract<br />
Understanding biological phenomena at atomic resolution is one of the keys to modern drug<br />
design. In particular, knowledge of 3D structures of proteins and their interactions with other<br />
macromolecules are necessary for designing chemical compounds that modify biological processes.<br />
Conventional methods for protein structure determinations comprise X-ray crystallography and<br />
nuclear magnetic resonance (<strong>NMR</strong>) spectroscopy. These techniques can also determine the binding<br />
mode of chemical compounds. Either technique can be slow and costly, making it highly relevant<br />
to explore alternative strategies. Paramagnetic <strong>NMR</strong> spectroscopy is emerging as such an<br />
alternative technique. In order to measure the paramagnetic effects, two <strong>NMR</strong> spectra are compared<br />
that have been measured with and without a bound paramagnetic metal ion. In particular,<br />
pseudocontact shifts (PCS) of nuclear spins are easily measured as the difference (in ppm) of the<br />
chemical shifts between the two spectra. PCSs provide long range and orientation dependent<br />
restraints, allowing positioning of the spin with respect to the magnetic susceptibility tensor<br />
anisotropy (Δχ-tensor) of the metal ion.<br />
In this thesis, I used the PCS effect to computationally extract information from <strong>NMR</strong><br />
spectra. I developed (i) a tool (called Possum) to automatically assign diamagnetic and<br />
paramagnetic spectra of the methyl groups of amino acid side chains, given structural information<br />
of the protein studied and prior knowledge of the Δχ-tensor; (ii) I designed a comprehensive<br />
software package (called Numbat) to extract Δχ-tensor parameters from assigned PCS values and<br />
the available 3D structure; and (iii) I incorporated PCS-based restraints into the protein structure<br />
prediction software CS-ROSETTA and demonstrated that this combination (PCS-ROSETTA)<br />
presents a significant improvement for de novo structure determination. The three projects serve<br />
different purposes at different stages of protein <strong>NMR</strong> studies. They could be combined in the<br />
following manner: Starting from assigned backbone PCSs, PCS-Rosetta could be used to determine<br />
the 3D structure of the protein. Possum can then be used to automatically assign the <strong>NMR</strong><br />
resonances of the methyl groups using PCSs. Finally, Numbat can be used to fit improved Δχ-<br />
tensors to all the PCS data, analyze the quality of the Δχ-tensors and identify possible wrong<br />
assignments. Iterative repetition of this protocol would give a 3D structural model of the protein<br />
with a minimum of data. Alternatively, the Δχ-tensor parameters and PCSs could be used as input<br />
for a traditional software package such as Xplor-NIH to compute a 3D structure of the protein.
vii<br />
Keywords<br />
paramagnetic nmr, pseudocontact shift, lanthanide, magnetic susceptibility tensor, protein,<br />
structure determination, resonance assignment, protein folding<br />
Australian and New Zealand Standard <strong>Research</strong> Classifications<br />
(ANZSRC)<br />
060112 (40%), 080301 (30%), 030406 (30%)
viii<br />
Table of Contents<br />
Declaration by author ......................................................................................................................... ii<br />
Statement of Contributions to Jointly Authored Works Contained in the <strong>Thesis</strong> .............................. ii<br />
Statement of Contributions by Others to the <strong>Thesis</strong> as a Whole ....................................................... iii<br />
Statement of Parts of the <strong>Thesis</strong> Submitted to Qualify for the Award of Another Degree ............... iii<br />
Published Works by the Author Incorporated into the <strong>Thesis</strong> ........................................................... iii<br />
Additional Published Works by the Author Relevant to the <strong>Thesis</strong> but not Forming Part of it ........ iv<br />
Acknowledgements .............................................................................................................................. v<br />
Abstract .............................................................................................................................................. vi<br />
Keywords .......................................................................................................................................... vii<br />
Australian and New Zealand Standard <strong>Research</strong> Classifications (ANZSRC) .................................. vii<br />
Table of Contents ............................................................................................................................. viii<br />
List of Figures .................................................................................................................................... xi<br />
List of Tables ................................................................................................................................... xiv<br />
List of Abbreviations ......................................................................................................................... xv<br />
1. Introduction ................................................................................................................................ 1<br />
1.1 Liquid State Nuclear Magnetic Resonance .......................................................................... 1<br />
1.2 Paramagnetic <strong>NMR</strong> .............................................................................................................. 4<br />
1.2.1 The four paramagnetic effects in <strong>NMR</strong> ........................................................................ 4<br />
1.2.2 The pseudocontact shift as a restraint ........................................................................... 7<br />
1.3 Computational study of paramagnetic proteins .................................................................. 12<br />
1.3.1 The assignment problem ............................................................................................. 12<br />
1.3.2 The Δχ-tensor determination problem ........................................................................ 16<br />
1.3.3 De novo structure determination of proteins .............................................................. 19<br />
1.4 Scope of the thesis .............................................................................................................. 23<br />
1.5 References .......................................................................................................................... 24<br />
2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls ...................... 27<br />
2.1 Abstract .............................................................................................................................. 28<br />
2.2 Introduction ........................................................................................................................ 28<br />
2.3 Experimental section .......................................................................................................... 30<br />
2.3.1 Sample preparation ..................................................................................................... 30<br />
2.3.2 <strong>NMR</strong> spectroscopy ..................................................................................................... 30
ix<br />
2.3.3 Manual resonance assignments from PCS ................................................................. 32<br />
2.3.4 The program Possum .................................................................................................. 32<br />
2.4 Results ................................................................................................................................ 36<br />
2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes ................................................. 36<br />
2.4.2 Methyl CZ-EXSY experiments ................................................................................... 38<br />
2.4.3 Resonance assignment of Met, Ala and Thr methyl groups ....................................... 39<br />
2.4.4 Assignments of Val, Leu, and Ile methyl groups ....................................................... 41<br />
2.4.5 Automatic assignments without EXSY data .............................................................. 44<br />
2.4.6 PCS and flexibility ..................................................................................................... 46<br />
2.5 Discussion .......................................................................................................................... 48<br />
2.6 Acknowledgement .............................................................................................................. 50<br />
2.7 Supporting Information Available ..................................................................................... 50<br />
2.8 References .......................................................................................................................... 51<br />
2.9 Supporting information ...................................................................................................... 56<br />
3. Numbat: new user-friendly method built for automatic Δχ-tensor determination ................... 75<br />
3.1 Abstract .............................................................................................................................. 76<br />
3.2 Keywords ........................................................................................................................... 76<br />
3.3 Abbreviations ..................................................................................................................... 76<br />
3.4 Introduction ........................................................................................................................ 77<br />
3.5 Algorithm ........................................................................................................................... 78<br />
3.6 Program Features ................................................................................................................ 80<br />
3.6.1 GUI ............................................................................................................................. 80<br />
3.6.2 Input files .................................................................................................................... 81<br />
3.6.3 Methyl group definition .............................................................................................. 81<br />
3.6.4 Optimization of the tensor parameters ....................................................................... 81<br />
3.6.5 Residual Anisotropic Chemical Shifts (RACS) ......................................................... 82<br />
3.6.6 Multiple PCS data sets ................................................................................................ 82<br />
3.6.7 PCS modification ........................................................................................................ 83<br />
3.6.8 PCS selection .............................................................................................................. 83<br />
3.6.9 Conventions ................................................................................................................ 83<br />
3.6.10 Error analysis ............................................................................................................ 84<br />
3.6.11 Visualization ............................................................................................................. 85<br />
3.6.12 Output ....................................................................................................................... 86<br />
3.7 Study case ........................................................................................................................... 86
x<br />
3.7.1 Subunit ε186 ............................................................................................................... 87<br />
3.7.2 Subunit θ ..................................................................................................................... 89<br />
3.7.3 Modelling the complex between ε186 and θ .............................................................. 90<br />
3.8 Conclusion .......................................................................................................................... 92<br />
3.9 Acknowledgment ............................................................................................................... 93<br />
3.10 References ........................................................................................................................ 93<br />
3.11 Supporting information .................................................................................................... 97<br />
4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA ..................... 101<br />
4.1 Abstract ............................................................................................................................ 102<br />
4.2 Introduction ...................................................................................................................... 102<br />
4.3 Results .............................................................................................................................. 104<br />
4.3.1 Test set ...................................................................................................................... 104<br />
4.3.2 Capacity of the PCS Score to Identify Native-like Structures ................................. 105<br />
4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA ............................................. 106<br />
4.3.4 Successes and Limits of PCS-ROSETTA Calculations ........................................... 109<br />
4.4 Discussion ........................................................................................................................ 110<br />
4.5 Materials and Methods ..................................................................................................... 112<br />
4.5.1 PCS-ROSETTA Score. ............................................................................................. 112<br />
4.5.2 PCS-ROSETTA Algorithm ...................................................................................... 113<br />
4.5.3 Input for PCS-ROSETTA ......................................................................................... 113<br />
4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination ............................... 114<br />
4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring ........................ 115<br />
4.6 Acknowledgments ............................................................................................................ 115<br />
4.7 References ........................................................................................................................ 115<br />
4.8 Supporting information .................................................................................................... 118<br />
5. Conclusion and perspectives .................................................................................................. 129<br />
5.1 The use of PCS for structure determination ..................................................................... 130<br />
5.1.1 Folding of proteins using only pseudocontact shifts ................................................ 130<br />
5.1.2 Uses of multiple lanthanide binding sites ................................................................. 132<br />
5.1.3 Development of a new PCS-ROSETTA protocol .................................................... 133<br />
5.2 The use of PCS for chemical shift assignment ................................................................. 134<br />
5.3 The use of PCS for protein docking ................................................................................. 135<br />
5.4 References ........................................................................................................................ 136
xi<br />
List of Figures<br />
Figure 1.1 <strong>NMR</strong> effects used for structure determination .................................................................. 3<br />
Figure 1.2 Representation of the distance and angular dependence of the four paramagnetic effects<br />
for the spin S, or system of spin S1-S2 (green) ...................................................................... 5<br />
Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />
spectra .................................................................................................................................... 5<br />
Figure 1.4 The PCS is less sensitive than RDC to small discrepancies between X-ray and solution<br />
structure ................................................................................................................................. 7<br />
Figure 1.4 The Δχ-tensor determination problem ............................................................................... 8<br />
Figure 1.5 Illustration of the three approaches of resonance assignment ........................................... 8<br />
Figure 1.6 Illustration of PCS restraints ........................................................................................... 11<br />
Figure 1.7 Protein complexes determined using PCSs ..................................................................... 11<br />
Figure 1.8 Flow-chart of the Echidna algorithm ............................................................................... 14<br />
Figure 1.9 Examples of the MAP problem ....................................................................................... 15<br />
Figure 1.10 Illustration of the task performed by the software Possum ........................................... 16<br />
Figure 1.11 Isosurface shapes calculated by equation (1.1) ............................................................. 16<br />
Figure 1.12 Sanson-Flamsteed projection for visualization of Δχ-tensor uncertainty ...................... 17<br />
Figure 1.13 Sanson-Flamsteed representations of Δχ-tensor axes orientation ................................. 19<br />
Figure 1.14 Illustration of the task performed by the software Numbat ........................................... 19<br />
Figure 1.15 Effect of the mobility of the tag on the PCS ................................................................. 22<br />
Figure 2.1 Methyl CZ-EXSY experiments ........................................................................................ 31<br />
Figure 2.2 Formulation of the assignment problem depending on the information available .......... 35<br />
Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex (containing<br />
13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a) La 3+ /Dy 3+<br />
and (b) La 3+ /Yb 3+ (red) ........................................................................................................ 37<br />
Figure 2.4 Assignment of Met CH3 from PCS ................................................................................ 39<br />
Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />
stereospecific resonance assignments .................................................................................. 42<br />
Figure 2.6 Residues showing deviations between predicted and experimental PCS ........................ 47<br />
Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment used in this study ................. 56<br />
Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />
( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C .................................................................... 57
xii<br />
Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />
186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled<br />
glucose ................................................................................................................................. 58<br />
Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />
containing 13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-<br />
TOCSY spectrum of the same sample (red) ........................................................................ 59<br />
Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />
methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile ............................. 60<br />
Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />
the cz- 186/ /Yb 3+ complex. ............................................................................................... 62<br />
Figure 3.1 Screenshots of Numbat main windows ........................................................................... 80<br />
Figure 3.2 Euler angle definitions used by Numbat ......................................................................... 84<br />
Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of its<br />
orientational uncertainty in a Sanson-Flamsteed projection plot ........................................ 85<br />
Figure 3.4 The four degenerate solutions arising from the symmetry of the Δχ-tensor around the x,<br />
y and z axes ......................................................................................................................... 92<br />
Figure 3.5 The complex between ε186 and θ determined by superimposition of Δχ-tensors .......... 92<br />
Figure 4.1 Fold identification by pseudocontact shifts ................................................................... 106<br />
Figure 4.2 Improved conformational sampling by PCS-ROSETTA .............................................. 108<br />
Figure 4.3 Energy landscapes generated by PCS-ROSETTA ........................................................ 108<br />
Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />
structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures<br />
(red) ................................................................................................................................... 110<br />
Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy ................... 119<br />
Figure S4.2 Improved fragment assembly by PCS-ROSETTA ..................................................... 120<br />
Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />
ROSETTA energies and C α rmsd values being calculated using only the core residues as<br />
defined in Table S4.1 ......................................................................................................... 121<br />
Figure S4.4 Identification of successful calculations with PCS-ROSETTA .................................. 122<br />
Figure S4.5 Flow diagram of PCS-ROSETTA ............................................................................... 123<br />
Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA .. 124<br />
Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the protein ....................... 131<br />
Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide fragments<br />
in the protein structure ....................................................................................................... 131
xiii
xiv<br />
List of Tables<br />
Table 2.1 Automatic assignment of methyl groups by the program Possum a ................................. 45<br />
Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />
complexes used in this study a ............................................................................................. 64<br />
Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />
using the program Possum a ................................................................................................ 69<br />
Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />
using the program Possum with methyl connectivity information in the Yb 3+ complex a .. 71<br />
Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />
of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+<br />
complex a ............................................................................................................................. 73<br />
Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule .................. 87<br />
Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a ............................... 89<br />
Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a 97<br />
Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />
conformers b of the <strong>NMR</strong> structure of . ............................................................................. 99<br />
Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA ....................... 104<br />
Table S4.1 PCS data information and grid search parameters used. .............................................. 125<br />
Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA. .................... 126
xv<br />
List of Abbreviations<br />
α Subunit α of the E. coli polymerase III<br />
ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />
θ Subunit θ of the E. coli polymerase III<br />
CCR Cross Correlated Relaxation<br />
CSA Chemical Shielding Anisotropy<br />
GUI Graphical User Interface<br />
HOT The bacteriophage P1-encoded homolog of θ<br />
MAP Multi-dimensional Assignment Problem<br />
<strong>NMR</strong> Nuclear Magnetic Resonances<br />
NOE Nuclear Overhauser Effect<br />
PCS Pseudocontact Shift<br />
ppm parts per million<br />
PRE Paramagnetic Relaxation Enhancement<br />
RACS Residual Anisotropic Chemical Shift<br />
RDC Residual Dipolar Coupling<br />
RMSD Root Mean Square Deviation<br />
UTR Unique Δχ-Tensor Representation
Chapter 1<br />
Introduction<br />
1. Introduction<br />
1.1 Liquid State Nuclear Magnetic Resonance<br />
In the last few decades Nuclear Magnetic Resonance (<strong>NMR</strong>) has been used routinely to<br />
investigate chemical compounds, proteins and complexes. The method relies on intrinsic spin<br />
properties of nuclei. Spins are first exposed to a strong and constant magnetic field delivered by the<br />
spectrometer. Then, they are excited by a radiofrequency pulse sequence. The precession of the<br />
spins is recorded during the free induction decay of the <strong>NMR</strong> experiment and converted into a<br />
frequency spectrum after Fourier transformation. Several parameters can be read from the spectra<br />
which provide information about the structure of the molecule.<br />
The chemical shift describes the dependence of nuclear magnetic energy levels on the<br />
electronic environment in a molecule. The chemical shift depends on the nature of the nucleus. It<br />
also strongly depends on its local neighborhood (up to 5 Å) due to the influence of the electron<br />
configuration. Hence, in a protein, almost all nuclei have different chemical shifts. This allows to<br />
distinguish them in the <strong>NMR</strong> spectrum by their specific resonance frequency.<br />
The dipole-dipole coupling is the direct magnetic interaction between two close spins. The<br />
effect is intra- and intermolecular, since it acts through space. The interaction energy is minimal<br />
when the two spins are aligned. This spin interaction is responsible for the Nuclear Overhauser<br />
Effect (NOE).
2 Chapter 1. Introduction.<br />
The scalar J-coupling results from an indirect magnetic interaction of two nuclear spins<br />
via their surrounding electrons. The effect is exclusively intramolecular because it is propagated<br />
through the bonds between two nuclei. Typically, it can be measured for nuclei separated by up to<br />
three bonds. In this case, it is referred as 3 J coupling. The 3 J-coupling constant yields angle<br />
information, as shown in Figure 1.1.b.<br />
<strong>NMR</strong> experiments measure the effects described above. One can classify <strong>NMR</strong> experiments<br />
in two groups: Those that yield structural information, and those that yield information to facilitate<br />
resonance assignment. The structural information comes mainly from direct dipole-dipole<br />
couplings providing short-range distance restraints between two spins, and from 3 J-couplings which<br />
offer dihedral angle restraints between the three bonds concerned. Some examples of <strong>NMR</strong><br />
experiments that offer structural information are:<br />
1D proton experiment: It provides the chemical shifts of the protons. Each 1 H has a<br />
different chemical shift, and each corresponding signal may be split into multiplets due to scalar<br />
couplings. The spectrum gets more complex as the number of spins increases. To reduce spectral<br />
overlap, two- or multi-dimensional <strong>NMR</strong> spectra can be recorded.<br />
1D carbon experiment is equivalent to the 1D proton experiment, but measured on carbon.<br />
Only 1% of natural carbon is 13 C and often the protein has to be 13 C labeled in order to observe<br />
carbon chemical shifts because the natural isotope of carbon ( 12 C) has no nuclear spin.<br />
NOESY: This experiment correlates spins that are separated in space by a distance of up to<br />
6 Å. The NOE observed in the NOESY experiment is based on the direct dipole-dipole coupling<br />
and provides valuable inter-spin distance information for the structure determination of proteins.<br />
NOE restraints are measured from the peak intensity in the NOESY experiment and provide<br />
distance information (Figure 1.1.a). As the effect is through-space and independent of chemical<br />
bonds, it is also useful for investigations of protein-ligand and protein-protein interactions.<br />
A major task and challenge of structure determination is to assign resonances to their<br />
corresponding atoms in order to apply experimental restraints to the correct set of atoms.<br />
Additional correlation <strong>NMR</strong> experiments are routinely recorded to assist in the chemical shift<br />
assignment task. These include:<br />
2D 15 N-HSQC experiments correlate protons with nitrogens of a 15 N labeled protein. These<br />
correlations simplify the analysis of a 1D spectrum since the additional dimension allows the
1.1 Liquid State Nuclear Magnetic Resonance. 3<br />
separation of the resonances into cross-peaks observed in a 2D plane. The recording time of the<br />
experiment is however longer.<br />
3D 13 C- 15 N-correlation experiments are 3-dimensional heteronuclear experiments that<br />
correlate C, N and H atoms. The resulting spectrum is in particular beneficial for large proteins,<br />
because resonance overlap is reduced in 3D. The disadvantage of these experiments is, however,<br />
that they are less sensitive than the 15 N-HSQC experiment and usually require that the protein is 13 C<br />
and 15 N double labeled.<br />
COSY experiments correlate 1 H spins via scalar couplings. They are used to identify groups<br />
of spins connected by less than four bonds (spin systems).<br />
TOCSY experiments correlate 1 H resonances that belong to the same spin-system, where<br />
pairs of spins are separated by no more than three bonds. TOCSY spectra include the COSY<br />
information, and are used to identify connected spin systems. TOCSY spectra are useful for<br />
sequential resonance assignment.<br />
NOESY experiments can also be used in the assignment procedure. COSY and TOCSY<br />
experiment should provide the amino acid type information of the resonances, whereas the NOESY<br />
experiment allows the sequential piecing together of the assignment by exploiting the distance<br />
dependency of the NOE effect.<br />
Figure 1.1 <strong>NMR</strong> effects used for structure determination. (a) The NOE effect<br />
provides distance information (up to 0.6 nm) between two protons. The intensity of<br />
the NOE signal is proportional to 1/d 6 , where d is the interproton distance. (b) The<br />
3 J-coupling gives dihedral angle restraints. The relationship between the angle Φ<br />
and the 3 J-coupling is given by the Karplus equation (Karplus, 1959) and the<br />
allowed values for Φ is illustrated by the plot 3 J = f(Φ). Figures adapted from web<br />
resources.
4 Chapter 1. Introduction.<br />
In contrast to NOE or 3 J coupling effects that are short-range (measureable for distance<br />
below 6 Å) and local (each measurement concerns an independent group of atoms), paramagnetic<br />
<strong>NMR</strong> introduces new effects that are long-range (measured up to 40 Å), and global (i.e. their effect<br />
is described for all spins in a common frame centered on the paramagnetic lanthanide).<br />
1.2 Paramagnetic <strong>NMR</strong><br />
1.2.1 The four paramagnetic effects in <strong>NMR</strong><br />
When a paramagnetic centre with unpaired electrons, such as a lanthanide ion, is present in<br />
a protein, the observed <strong>NMR</strong> spectrum changes due to induced paramagnetic effects. By<br />
comparison of the diamagnetic and paramagnetic spectra, one can observe the following four<br />
paramagnetic effects:<br />
The pseudocontact shift (PCS): It is given by equation (1.1), where the spin of interest is<br />
described by its polar coordinate in an internal frame (the Δχ-tensor frame) centered on the<br />
paramagnetic center (Figure 1.2.a).<br />
(1.1)<br />
Δχax = χz – (χz + χy)/2 and Δχrh = (χx - χy) are respectively the axial and rhombic component<br />
that describe the anisotropic effect; r, θ and θ are the polar coordinate of the spin in the Δχ-tensor<br />
frame (Figure 1.2.a). The PCS is a long range effect (up to 40 Å) which decays with 1/r 3 , and is<br />
measured as the difference between the paramagnetic and diamagnetic chemical shift (Figure 1.3).<br />
The residual dipolar coupling (RDC): With an attached paramagnetic lanthanide, a<br />
protein weakly aligns with respect to the magnetic field. RDCs are manifested as increases or<br />
decreases of the magnitudes of multiplet splittings that can be observed in undecoupled <strong>NMR</strong><br />
spectra (Figure 1.3). The RDC can also be back-calculated (equation (1.2)) provided that the<br />
orientation of the two spins with respect to the alignment tensor is known (Figure 1.2.b).<br />
with:<br />
(1.2)
1.2 Paramagnetic <strong>NMR</strong>. 5<br />
(1.3)<br />
B0 is the magnetic field, γH and γN are proton and nitrogen magnetogyric ratios, ћ Planck’s<br />
constant divided by 2π, S the order parameter of the molecular alignment, rNH the N-H distance, kB<br />
the Boltzmann constant, and T the absolute temperature.<br />
Figure 1.2 Representation of the distance and angular dependence of the four<br />
paramagnetic effects for the spin S, or system of spin S1-S2 (green). (a) PCSs and (b)<br />
RDCs are described in the χ-tensor frame centered on the lanthanide l (red). (c)<br />
PREs only yield distance dependence while (d) CCRs also yield angle dependence.<br />
Adapted from (Pintacuda et al., 2004).<br />
Figure 1.3 Experimental measurement of the four paramagnetic effects with two 1D undecoupled<br />
spectra. The figure shows the diamagnetic and paramagnetic antiphase doublets. PCS is measured
6 Chapter 1. Introduction.<br />
as the chemical shift difference. RDC is measured as the difference in line splitting. PRE and CCR<br />
can be determined from the differential line broadening.<br />
The paramagnetic relaxation enhancement (PRE): The PRE yields distance information<br />
between the paramagnetic lanthanide and the spin of interest (Figure 1.2.c). It depends on the<br />
distance r between the paramagnetic center and the nuclear spin with 1/r 6 (equation (1.4)) and<br />
accounts for the difference of line broadening between the paramagnetic and diamagnetic chemical<br />
shift (Figure 1.3).<br />
with:<br />
(1.4)<br />
(1.5)<br />
where ηr is the rotational correlation time, ωH the Larmor frequency of the proton, μ0 the<br />
vacuum permeability, gJ the g-factor, μB the Bohr magneton, and J the total spin moment.<br />
The cross correlated relaxation (CCR): This effect is also measured by the observed line<br />
broadening; more precisely by comparing the width between the two components of the antiphase<br />
doublet (Figure 1.3). This effect combines distance and angle dependence (equation (1.6) and<br />
Figure 1.2.d).<br />
with:<br />
(1.6)<br />
(1.7)<br />
All four paramagnetic effects can be used to study protein structure. Residual dipolar<br />
coupling has been widely used to help determining protein structures (Rohl et al., 2002), but when<br />
carefully comparing RDCs measured by <strong>NMR</strong> with RDCs predicted from a crystal structure, it was<br />
observed that small discrepancies between the N-H bond orientation in crystal and liquid state can
1.2 Paramagnetic <strong>NMR</strong>. 7<br />
lead to large deviations between measurement and prediction. PCSs are less sensitive to the<br />
difference between the crystal model and the solution structure. The focus in my PhD has been on<br />
using PCS to study proteins and their interactions.<br />
Figure 1.4 The PCS is less sensitive than RDC to small discrepancies<br />
between X-ray and solution structure. A large change in the orientation<br />
of the N-H vector will considerably affect the calculation of the RDC<br />
(equation (1.2) and Figure 1.2.b). On the other hand, the PCS will be<br />
less affected as the relative position of the hydrogen with respect to the<br />
tensor frame is almost unchanged when PCS are measurable (d > 10 Å).<br />
1.2.2 The pseudocontact shift as a restraint<br />
PCSs can be calculated using equation (1.1) if the magnitude (two parameters: Δχax and<br />
Δχrh, Figure 1.5.a), location (three Cartesian coordinates x, y and z, Figure 1.5.b) and orientation<br />
(three Euler angles α, β and γ, Figure 1.5.c) of the Δχ-tensor are known, and if a structure is<br />
available. The paramagnetic chemical shift of a spin located close to the paramagnetic center is<br />
broadened beyond detection due to PRE, and consequently, its PCS cannot be observed. The cutoff<br />
radius is typically about 10 Å.<br />
The <strong>NMR</strong> resonances have first to be assigned in order to measure PCSs. Three kinds of<br />
assignment have to be distinguished (Figure 1.6):<br />
(i) The assignment of the diamagnetic <strong>NMR</strong> spectrum: it is routinely performed with<br />
conventional sequential assignment methods using one or a combination of COSY,<br />
TOCSY, NOESY <strong>NMR</strong> and triple resonance experiments.
8 Chapter 1. Introduction.<br />
Figure 1.5 The Δχ-tensor determination problem. The Δχ-tensor can be conveniently represented by<br />
isosurfaces. For a given ppm value p, all spins that have a PCS value equal to p would be located<br />
on a given isosurface (red for negative PCS, blue for positive PCS). (a) The Δχax and Δχrh<br />
parameters that have to be determined are responsible for the shape and size of the isosurfaces. (b)<br />
The location of the paramagnetic center is described by three Cartesian coordinates. (c) The three<br />
Euler angles α, β and γ relate the orientation of the Δχ-tensor to the protein frame.<br />
Figure 1.6 Illustration of the three approaches of resonance assignment. (a) The peaks of the<br />
diamagnetic spectrum (blue) are assigned to their corresponding amino acids. (b) The<br />
paramagnetic cross peaks (red) are assigned to their corresponding residues. (c) When the pairing
1.2 Paramagnetic <strong>NMR</strong>. 9<br />
between diamagnetic and paramagnetic cross peak is already determined by a transfer experiment,<br />
the pairs of cross peaks can be assigned to their corresponding residues.<br />
(ii) The assignment of the paramagnetic <strong>NMR</strong> spectrum: sequential approaches are not<br />
suitable because the lanthanide induces PRE effects resulting in large line<br />
broadening for residues close to the paramagnetic center. Proposed experimental<br />
approaches use temperature dependence (Nguyen et al., 1999), magnetic field<br />
dependence (Bertini et al., 1998) or fast / slow exchange of the lanthanide (John et<br />
al., 2007) to transfer the chemical shift from the diamagnetic to the paramagnetic<br />
state.<br />
(iii) The assignment of the pseudocontact shift: One can pair the chemical shifts of<br />
diamagnetic and paramagnetic resonances when a transfer experiment can be<br />
performed, but the assignment to the individual atoms is still unknown. There is no<br />
direct method to assign experimental PCSs to the structure; the only way is to<br />
compare experimental and predicted PCSs to find the best match.<br />
Comparison between a calculated and measured PCS provides a restraint (Figure 1.7) that<br />
has been used for different purposes in the literature:<br />
Structure refinement: Allegrozzi et al. (Allegrozzi et al., 2000) showed how using PCS as<br />
restraints in structure refinement improved the quality of structures of calbindin. They compared<br />
the original <strong>NMR</strong> structures obtained from 1539 NOEs with structures refined using additional<br />
PCS restraints from three different lanthanides. The magnitude of the paramagnetic dipole moment<br />
differs between different paramagnetic metal ions. Hence, the cutoff radius and the distance for<br />
which PCSs can still be observed is lanthanide-dependent. The three lanthanides chosen in this<br />
study (Ce 3+ , Yb 3+ and Dy 3+ ) cover different regions of the 3D space in shells of 5-15 Å for Ce 3+ , 9-<br />
25 Å for Yb 3+ , and 13-40 Å for Dy 3+ . Each lanthanide used focuses on a different region of the<br />
protein, and provides additional independent information to the NOEs. The resulting ensemble of<br />
structures generated with each PCS data set separately shows better definition of the backbone in<br />
the area covered by the lanthanide used. In particular, the residues 56-59 have an RMSD above 1.5<br />
Å in the family of <strong>NMR</strong> structures. Inclusion of PCS restraints decreases the RMSD value to 0.75<br />
Å. What this study failed to show is the improvement in structure quality when all data (from all<br />
different lanthanides) are used simultaneously in the refinement procedure.
10 Chapter 1. Introduction.<br />
Protein-ligand interaction: John et al. (John et al., 2006) showed how PCS restraints can<br />
be used to determine the structure of protein-ligand complexes. The approach is of major interest<br />
for drug screening. In a first step, the Δχ-tensor is determined for the target protein. This is the most<br />
complicated task because the protein can be large, making the assignment of chemical shifts<br />
difficult. However, in the context of drug screening this step needs to be performed only once. Each<br />
ligand that is being screened is isotopically labeled, and a paramagnetic spectrum of the ligand in<br />
complex with the protein is recorded. The assignment of a ligand spectrum and the corresponding<br />
Δχ-tensor are swiftly and easily obtained. The two Δχ-tensors from ligand and protein being the<br />
same by definition, a simple superimposition of them leads to the rigid body structure of the<br />
complex. A molecular dynamics package is further used to locally refine the conformation of the<br />
protein on the contact surface. John and coworkers demonstrated the approach with the thymidine<br />
nucleotide as the ligand binding to the ε subunit of DNA polymerase III. The determined thymidine<br />
structure was found to have a very similar binding mode compared to the thymidine<br />
monophosphate present in the reference crystal structure.<br />
Protein-protein interaction: Pintacuda et al. (Pintacuda et al., 2006) described a protocol<br />
to solve the structure of a protein-protein complex using only PCSs and illustrated the protocol on<br />
the example of the N-terminal domain of the subunit ε and subunit θ of the E. coli DNA<br />
polymerase III. Again the method relies on the determination of the Δχ-tensors, first relative to one<br />
molecule (ε, Figure 1.8.a) and then relative to the second molecule (θ, Figure 1.8.b), followed by<br />
the superimposition of the Δχ-tensor frames (Figure 1.8.c). Such an approach is particularly<br />
relevant considering the difficulty to co-crystallize proteins in complexes, compared to the<br />
crystallization of the components separately. An alternative has been to use RDCs (McCoy et al.,<br />
2002), but the relative orientation and location of the two rigid bodies obtained with PCSs is more<br />
accurate compared to what would result from a complex build from RDC data, since PCSs yield<br />
simultaneously orientation and distance information, while RDC data lack distance information and<br />
are sensitive to small fluctuations of NH bond orientation. The resulting rigid-body docked<br />
complex could exhibit sterical clashes that would need to be resolved with a molecular refinement<br />
package. The final result is valuable as input for docking refinement software considering that the<br />
most difficult part of docking two molecules in a complex is to obtain with confidence the<br />
approximate binding sites.
1.2 Paramagnetic <strong>NMR</strong>. 11<br />
Figure 1.7 Illustration of PCS restraints. If the Δχ-tensor parameters are fully determined, one can<br />
accurately predict PCS values. The assignment of both diamagnetic and paramagnetic spectra<br />
provides experimental PCSs. Direct comparison of both offers a PCS-based restraint.<br />
Figure 1.8 Protein complexes determined using PCSs. Paramagnetic <strong>NMR</strong><br />
experiments are performed on the complex, (a) firstly with only the first protein<br />
labeled, (b) secondly with only the second protein labeled. The two Δχ-tensors are<br />
fitted separately, according to the experimental PCSs. The two Δχ-tensors are<br />
theoretically the same, their superimposition provides the structure of the complex<br />
(c). Adapted from (Pintacuda et al., 2006).
12 Chapter 1. Introduction.<br />
1.3 Computational study of paramagnetic proteins<br />
1.3.1 The assignment problem<br />
Assigning the resonances of <strong>NMR</strong> spectra is a necessary step towards applying <strong>NMR</strong><br />
restraints for protein computation. Although several software packages are capable of predicting<br />
chemical shifts, they require high-resolution 3D structures and lack accuracy especially for the<br />
unstructured parts of a protein (Shen et al., 2007). Their algorithms are not based on pure<br />
calculation from the 3D structure, but rather use statistical information extracted from the pdb data<br />
bank and from deposited chemical shifts.<br />
Pseudocontact shifts can be accurately predicted using equation (1.1). Consequently, it is<br />
possible to compare measured and calculated PCSs. The root mean square deviation between the<br />
calculated and the measured PCSs provides a score to minimize in order to yield the best possible<br />
assignment. This strategy has been applied to simultaneously assign measured PCSs and optimize<br />
the Δχ-tensor by a software package named ―Platypus‖ (Pintacuda et al., 2004). In this work, the<br />
protein was selectively labeled by residue type in order to simplify spectra. This led to the<br />
measurement of unassigned PCSs by unambiguous identification of the connectivity between a<br />
diamagnetic cross peak and its paramagnetic partner shifted along the diagonal in a 2D 15 N-HSQC<br />
spectrum. The diagonal shift is explained by the fact that in first approximation, hydrogen and<br />
nitrogen of an NH group have similar PCS values. This is due to the short distance between N and<br />
H atoms (approximately 1 Å) compared to the large distance (at least 10 Å) separating the NH bond<br />
from the lanthanide inducing the observed PCS. As a result, the polar coordinates within the Δχ-<br />
tensor frame are similar for the N and H spins and hence, both spins experience similar<br />
pseudocontact shifts. The second step of the protocol consists of combining a grid search over the<br />
Δχ-tensor parameters with an optimal assignment algorithm called the Hungarian method (Kuhn,<br />
1955): The grid search covers a large ensemble of possible combinations for the Δχ-tensor<br />
parameters. At each node of the grid search, it becomes possible to use the Hungarian method to<br />
obtain the optimal assignment in a polynomial time. A score can be calculated over each node to<br />
reflect the quality of the assignment, and compared to other nodes to extract the best assignment<br />
along with the best set of Δχ-tensor parameters.<br />
The ―diagonal rule‖ used in (Pintacuda et al., 2004) to manually measure PCSs has also<br />
been exploited in (Schmitz et al., 2006) to automatically assign paramagnetic chemical shifts of a<br />
full 15 N-HSQC spectrum, given a known 3D structure and the list of assigned diamagnetic
1.3 Computational study of paramagnetic proteins. 13<br />
resonances. The software Echidna was developed to overcome the difficulties of sequential<br />
assignment of cross peaks in a paramagnetic spectrum. It works as follow:<br />
Firstly, a small number n1 of paramagnetic peaks are paired with diamagnetic peaks by<br />
automatically screening unambiguous possibilities along the diagonal of a 2D 15 N-HSQC<br />
spectrum.<br />
Secondly, a Δχ-tensor is calculated to minimize the root mean square deviations between the n1<br />
experimental and calculated PCSs.<br />
Thirdly, the Δχ-tensor is used to predict for each diamagnetic cross peak the area of the<br />
spectrum where the paramagnetic partner is expected. This area is centered on the back-<br />
calculated PCS value, and defines a much smaller zone than the diagonal strip used in the first<br />
step. More paramagnetic peaks are unambiguously assigned.<br />
The two last steps are iterated until convergence. A final assignment is performed in order to<br />
yield the overall best assignment of all cross peaks after convergence of the method. This<br />
assignment uses the Hungarian method which finds in a polynomial time the optimal<br />
assignment among the n! possibilities, with n being the number of peaks to assign. A complete<br />
flow chart of the method is given in Figure 1.9.
14 Chapter 1. Introduction.<br />
Figure 1.9 Flow-chart of the Echidna algorithm.<br />
Both these two automatic <strong>NMR</strong> assignment techniques require partial initial assignments. In<br />
the case of Echidna, the whole diamagnetic spectrum needs to be assigned, while Platypus required<br />
the connectivity between the paramagnetic and the diamagnetic cross peak. In both cases, those<br />
prerequisites reduced the computational problem to a 2D Multi-dimensional Assignment Problem<br />
(MAP, Figure 1.10.a). A 2D-MAP is easily solved with the Hungarian method. More challenging<br />
and attractive would be to shortcut any initial manual assignment and start directly from the<br />
chemical shift lists of the diamagnetic and paramagnetic states. Such a method could be applied to<br />
automate the side chain assignment of a protein, once the preliminary and easier task of assigning<br />
the backbone chemical shifts and determining the Δχ-tensor has been done, for example with
1.3 Computational study of paramagnetic proteins. 15<br />
Echidna. Computationally, the problem becomes a 3D-MAP (Figure 1.10.b) with the number of<br />
possibilities increasing by (n!) 2 . This problem can no longer be solved in a polynomial time since a<br />
MAP problem of dimension larger or equal to three is proven to be NP-hard 1 (Karp, 1972). Instead,<br />
a heuristic method has to be used to find a good assignment, but without any guarantee of reaching<br />
the optimal assignment. An approach that tries to cope with the computational challenges (by<br />
means of additional experimental information) led to the development of a software dubbed<br />
Possum (Figure 1.11), which is described in Chapter 2.<br />
Figure 1.10 Examples of the MAP problem. (a) When calculated and predicted PCSs have to be<br />
matched, the cost function c(i, j) can be defined as the square deviation of the two values. The aim<br />
is to minimize the sum Q. The binary variables xi,j are defined to ensure that each element i and j is<br />
chosen exactly once. (b) The experimental PCSs are not directly available, but the paramagnetic<br />
and chemical shifts are measured. Their differences give possible experimental PCSs.<br />
1 The time required to solve a NP-Hard problem drastically increases with the size of the problem<br />
which is, in this context, the number n of residue to assign. Some problems with low n value (< 12)<br />
could be easily solved in a few minutes. The same problem, with just one extra peak to assign,<br />
remained unsolved after days of calculation. This illustrates that, independently of the algorithm<br />
used, NP-Hard problem becomes practically and suddenly insolvable for a given size n.which is, in<br />
this context, the number n of residue to assign. Some problems with low n value (< 12) could be<br />
easily solved in a few minutes. The same problem, with just one extra peak to assign, remained<br />
unsolved after days of calculation. This illustrates that, independently of the algorithm used, NP-<br />
Hard problem becomes practically and suddenly insolvable for a given size n.
16 Chapter 1. Introduction.<br />
Figure 1.11 Illustration of the task performed by the software Possum.<br />
The software package requires the Δχ-tensor parameters to perform a<br />
structure based automatic assignment of the <strong>NMR</strong> resonances.<br />
1.3.2 The Δχ-tensor determination problem<br />
The Δχ-tensor determination problem consists of obtaining the eight parameters<br />
characterizing the Δχ-tensor such that the discrepancy between the observed and calculated PCS is<br />
minimal. These comprise the determination of the paramagnetic center location, of the Δχ-tensor<br />
orientation, and of the axial and rhombic component. The last two parameters characterize the<br />
shape of PCS effects that ranges between two extremes as illustrated in Figure 1.12.<br />
Figure 1.12 Isosurface shapes calculated by equation (1.1). The positive isolevel is shown in blue,<br />
the negative is shown in red. Both isolevels have the same absolute value. (a) When the rhombic<br />
component is equal to zero, the isosurface is axially symmetric (dotted line). (b) Isosurface when
1.3 Computational study of paramagnetic proteins. 17<br />
the axial and rhombic components are equal. Three planar symmetries remain for the three planes<br />
orthogonal to the three main axes. (c) For an axial value of zero, the isosurface presents two<br />
additional planar symmetries due to an ambiguous main axis. While the axial and rhombic value<br />
“decide” the isosurface shape, equation (1.1) constrains any isosurfaces to be shaped between the<br />
two extremes shown in (a) and (c).<br />
Figure 1.13 Sanson-Flamsteed projection for visualization of Δχ-<br />
tensor uncertainty. The axes of the Δχ-tensor are projected on a 2D<br />
surface. The uncertainty of the axes orientation can be reflected by the<br />
size and shape of the colored area used for each axis, as done in<br />
Figure 1.14.<br />
The Δχ-tensor is similar to the alignment tensor used to compute RDCs. The alignment<br />
tensor parameters are easily determined by singular value decomposition, as equation (1.2) is linear<br />
with respect to the alignment tensor parameters. The axial and rhombic component of the alignment<br />
tensor can also be estimated without the requirement of protein coordinates, by exploiting the<br />
isotropic distribution of the NH bond orientation in space (Clore et al., 1998). The equation that<br />
governs the PCS is non-linear because no assumption of the isotropic distribution of PCS values<br />
can be made. Consequently, the way to determine the Δχ-tensor relies on a minimization of a cost<br />
comparing predicted and experimental PCS. A few existing computational software packages could<br />
be used to get the Δχ-tensor parameters, such as Fantasian (Banci et al., 1997), or Xplor-NIH<br />
(Schwieters et al., 2003, Schwieters et al., 2006). However, they can be cumbersome to use and<br />
have poor interactivity with the user.<br />
Another important feature that existing approaches fail to provide is the possibility to<br />
estimate the quality of the fit and directly visualize it in a Sanson-Flamsteed representation
18 Chapter 1. Introduction.<br />
(Bugayevskiy et al., 1995) as shown in Figure 1.13. A Sanson-Flamsteed plot is a projection of a<br />
sphere on a plane, and is commonly used in geography to project the globe on a map. The axes of<br />
the Δχ-tensor penetrate the surface of a unitary sphere, and the penetration points can be identified<br />
on the projection. This representation is convenient to highlight how reliable the fit of a Δχ-tensor<br />
is in the context of the data and structure used for the calculation. For example, the Δχ-tensor found<br />
for the ε subunit of DNA polymerase III is particularly well defined (Figure 1.14.a), while the<br />
Sanson-Flamsteed plot corresponding to the fit for the θ subunit (in complex with ε) reveals more<br />
uncertainties (Figure 1.14.b). Those differences are mainly due to the large distance (>15 Å) that<br />
separates θ from the lanthanide bound to ε.<br />
When docking a small molecule compound such as a drug to a protein, it is likely that the<br />
Sanson-Flamsteed plot highlights large uncertainties because only a small number of PCSs can be<br />
measured. To improve the situation, an important and desired feature would be to use the<br />
information of the protein’s Δχ-tensor in order to improve the fit of the drug’s Δχ-tensor. The<br />
resulting enhancements are illustrated in Figure 1.14.c (compared to Figure 1.14.b) for the ε/θ<br />
protein-protein complex and are expected to be similar for small ligand-protein complexes.<br />
In order to address the presented issues, it is required to have an efficient software package<br />
to work with the Δχ-tensor. Chapter 3 presents the software package ―Numbat‖ (Figure 1.15) that<br />
tries to meet all those needs.
1.3 Computational study of paramagnetic proteins. 19<br />
Figure 1.14 Sanson-Flamsteed representations of Δχ-tensor axes orientation. The error analysis<br />
used one thousand Monte-Carlo iterations that randomly selected 50% of the PCS data set. (a) The<br />
Δχ-tensors fitted for ε are very well defined. (b) As the lanthanide is 15 Å away from the θ subunit,<br />
the fitted Δχ-tensors are less accurate, as indicated by the large area each axis can spawn. (c)<br />
When keeping the relative orientation and magnitude of the two Δχ-tensors fixed (to the value<br />
determined for ε), the quality of the Δχ-tensor fitted increases, resulting in a more reliable complex<br />
of the two subunits. The well defined z-axis-area of the two Δχ-tensors (blue and brown) in (c)<br />
illustrates the reduced uncertainty around the z axes.<br />
Figure 1.15 Illustration of the task performed by the software<br />
Numbat. Given assigned PCSs, Numbat performs a structure<br />
based determination of the Δχ-tensor.<br />
1.3.3 De novo structure determination of proteins
20 Chapter 1. Introduction.<br />
The determination of protein structures is one of the main challenges of the post genomic<br />
era. The knowledge of structures at atomic detail is a prerequisite to understand how<br />
macromolecular complexes assemble and perform their tasks within living organisms. The<br />
established methods of X-ray crystallography and <strong>NMR</strong> spectroscopy still require significant<br />
human and financial resources to determine the structure of proteins of interest. Efforts are being<br />
focused on high-throughput methods to speed up the process of characterizing a large number of<br />
proteins (Kobe et al., 2008).<br />
De novo structure prediction software packages such as ROSETTA (Simons et al., 1997)<br />
are quite successful for small proteins (< 100 residues). The large size of the conformational space<br />
to explore makes it difficult, however, to tackle larger proteins. To overcome the ―sampling<br />
problem‖, one approach is to include additional experimental restraints that facilitate the three-<br />
dimensional reconstruction of protein structures. Those restraints must be easier to measure than it<br />
would be to obtain crystals of the target protein, or to measure and assign the NOEs required for the<br />
full determination of the structure.<br />
The pseudocontact shift effect is a candidate for this approach. PCSs can be measured<br />
swiftly and accurately as the chemical shift difference between two spectra, once a paramagnetic<br />
probe has been introduced into the protein. The use of lanthanide binding tags makes these<br />
techniques potentially available to any protein. Several lanthanide tags are now available. For a<br />
recent review, see (Su et al., 2009b). While it is not yet routine to attach lanthanide binding tags to<br />
a protein, several options are possible. Attachment by one or two disulfide bonds (Smith et al.,<br />
1975), attachment at one of the termini of the protein (Donaldson et al., 2001), or even use of a<br />
non-covalent tag as demonstrated by (Su et al., 2009a) can be considered. It is expected that<br />
lanthanide attachment techniques will become routine in the future.<br />
Beyond the process of attachment, the second challenge is to have a tag that is not flexible.<br />
The physical model underlying equation (1.1) is accurate if the Δχ-tensor parameters are constant<br />
over time. This hypothesis could be questioned if small movements of the tag occur. Fluctuation of<br />
the tag produces two undesired effects:<br />
(i) It changes the electronic environment in the vicinity of the lanthanide and<br />
consequently, the orientation or magnitude of the Δχ-tensor. As equation (1.1) is<br />
linear with respect to the axial component, the rhombic component, and the three<br />
Euler angles, changes over time of those five parameters will not affect the way<br />
PCSs are predicted. More precisely, n conformations of the Δχ-tensor occurring
1.3 Computational study of paramagnetic proteins. 21<br />
within the protein (and sharing the same center coordinate) can be equally explained<br />
by one single conformation. To demonstrate this, let’s take a spin i of measured<br />
pseeudocontact shift PCSi. PCSi is the sum of the contribution of the n states of the<br />
Δχ-tensor.<br />
For the state j, an alternative formula of the pseudocontact shift in a given frame f is:<br />
With<br />
(1.8)<br />
(1.9)<br />
(1.10)<br />
Where x, y, z are the Cartesian coordinates of the spin i in the frame f, and r the distance<br />
between the lanthanide and the spin i. Equation (1.8) becomes:<br />
With<br />
(1.11)<br />
(1.12)<br />
The traceless and symmetric matrix D contains the effective Δχ-tensor parameters that are<br />
necessary and sufficient to describe the PCS experienced by the spin i.
22 Chapter 1. Introduction.<br />
(ii) It could displace the position of the paramagnetic center with respect to the protein<br />
frame. The amplitude of those movements depends on the size and rigidity of the tag<br />
used. Small displacements have mostly small impact on the value of the PCS<br />
because PCSs are usually observable only more than ten Angstroms from the<br />
lanthanide. To assess this principle further, the comparison between a static metal<br />
ion and a mobile one following a realistic trajectory is illustrated in Figure 1.16.<br />
Figure 1.16 Effect of the mobility of the tag on the PCS. (a) Twelve tensors are being used to<br />
represent a realistic trajectory of the tag. They have random orientation, random axial and<br />
rhombic values, and are located three Angstroms away from the anchor point (black dot). The<br />
range of angles covers 110 degrees, in steps of 10 degrees. (b) The isosurfaces resulting from the<br />
ensemble of tensors in (a). The red surface represents the isolevel of 5.0 ppm, the blue one<br />
corresponds to -5.0 ppm. The shapes are distorted compared to the typical shape of an isosurface<br />
shown in Figure 1.12. (c) PCSs cannot be measured closer than 10 Angstroms from the<br />
paramagnetic center. The cutoff area is shown in grey. (d) Surfaces of isolevel 1.0 ppm and -1.0<br />
ppm are shown superimposed to (b). They exhibit a classical profile as seen in Figure 1.12. (e)<br />
The cutoff of 10 Angstroms is superimposed on figure (d).
1.4 Scope of the thesis. 23<br />
Once the PCSs have been measured, the next step is to use them appropriately in order to<br />
extract structural information. Using the PCSs to filter the correct structure(s) (by comparing<br />
calculated and experimental values) among a large ensemble of generated structures would not be<br />
enough. The PCS-based restraint needs to be directly incorporated into the process of structure<br />
generation to bias the outputted conformations towards the native one. Several options are to be<br />
considered: incorporating a PCS-restraint into a molecular dynamics software, a molecular<br />
refinement package that employs simulated annealing routines, or into a molecular fragment<br />
replacement software. The main question is which one of those approaches is the most suited to<br />
capture the global nature of the PCS effect.<br />
The merit of the PCS for de novo structure determination in the context of a molecular<br />
fragment replacement is described in Chapter 4. A PCS score function has been added to the<br />
package CS-ROSETTA (Shen et al., 2008). The ability of the PCS to drive CS-ROSETTA<br />
calculations towards the native conformation and to identify native like structures is discussed.<br />
1.4 Scope of the thesis<br />
This thesis covers different aspects of paramagnetic <strong>NMR</strong> from a computational point of<br />
view. This includes the use of PCSs for <strong>NMR</strong> resonance assignment, for Δχ-tensor determination in<br />
preparation of rigid body complex calculations, and de novo structure determinations of proteins.<br />
The rest of the thesis is organized as follow:<br />
In Chapter 2 is described an experimental and a computational approach to assign chemical<br />
shifts of methyl groups from the paramagnetic and diamagnetic <strong>NMR</strong> spectra. The computational<br />
route is supported by the development of the software Possum which was tested on artificial data<br />
first before being applied to experimental data.<br />
In Chapter 3 is presented a newly developed software that works specifically with<br />
pseudocontact shifts. The possibilities offered by the software are discussed and illustrated by the<br />
rapid reconstruction of the complex between the subunit ε and θ of the DNA polymerase III.<br />
In Chapter 4 is reported the incorporation of the PCS into the molecular fragment<br />
replacement software CS-ROSETTA, and the development of a new protocol to perform, for the<br />
first time, de novo protein structure determination using only PCSs and chemical shifts as<br />
experimental restraints.
24 Chapter 1. Introduction.<br />
Chapters 5 concludes this thesis by presenting some perspective of further development to<br />
better exploit PCS information in structural biology.<br />
1.5 References<br />
Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />
pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />
angstrom from the metal ion. J Am Chem Soc 122:4154-4161<br />
Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />
HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />
dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />
29:68-76<br />
Bertini I, Felli IC and Luchinat C (1998) High magnetic field consequences on the <strong>NMR</strong> hyperfine<br />
shifts in solution. J Magn Reson 134:360-364<br />
Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />
London.<br />
Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />
the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />
structural information. J Magn Reson 133:216-221<br />
Donaldson LW, Skrynnikov NR, Choy WY, Muhandiram DR, Sarkar B, Forman-Kay JD and Kay<br />
LE (2001) Structural characterization of proteins with an attached ATCUN motif by<br />
paramagnetic relaxation enhancement <strong>NMR</strong> spectroscopy. J Am Chem Soc 123:9843-9847<br />
John M, Headlam MJ, Dixon NE and Otting G (2007) Assignment of paramagnetic 15 N-HSQC<br />
spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51<br />
John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />
ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />
Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />
Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />
Karplus M (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys<br />
30:11-15<br />
Kobe B, Guss M and Huber T (2008). Structural Proteomics: High Throughput Methods. Humana<br />
Press, Totowa, NJ, USA.
1.5 References. 25<br />
Kuhn HW (1955) The Hungarian Method for the assignment problem. Naval Res Logistics Quart<br />
2:83-97<br />
McCoy MA and Wyss DF (2002) Structures of protein-protein complexes are docked using only<br />
<strong>NMR</strong> restraints from residual dipolar coupling and chemical shift perturbations. J Am<br />
Chem Soc 124:2104-2105<br />
Nguyen BD, Xia ZC, Yeh DC, Vyas K, Deaguero H and La Mar GN (1999) Solution <strong>NMR</strong><br />
determination of the anisotropy and orientation of the paramagnetic susceptibility tensor as<br />
a function of temperature for metmyoglobin cyanide: Implications for the population of<br />
excited electronic states. J Am Chem Soc 121:208-217<br />
Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />
assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />
Chem Soc 126:2963-2970<br />
Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />
<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />
Soc 128:3696-3702<br />
Rohl CA and Baker D (2002) De novo determination of protein backbone structure from residual<br />
dipolar couplings using rosetta. J Am Chem Soc 124:2723-2729<br />
Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />
tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />
87<br />
Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />
structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />
Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />
structure determination package. J Magn Reson 160:65-73<br />
Shen Y and Bax A (2007) Protein backbone chemical shifts predicted from searching a database for<br />
torsion angle and sequence homology. J Biomol <strong>NMR</strong> 38:289-302<br />
Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />
Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />
A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc<br />
Natl Acad Sci U S A 105:4685-4690<br />
Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />
from fragments with similar local sequences using simulated annealing and bayesian<br />
scoring functions. J Mol Biol 268:209-225<br />
Smith DJ, Maggio ET and Kenyon GL (1975) Simple alkanethiol groups for temporary blocking of<br />
sulfhydryl groups of enzymes. Biochemistry 14:766-71
26 Chapter 1. Introduction.<br />
Su XC, Liang HB, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- is a convenient paramagnetic<br />
shift reagent for protein <strong>NMR</strong> studies. J Am Chem Soc 131:10352-10353<br />
Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />
<strong>NMR</strong> in press
Chapter 2<br />
Possum: paramagnetically<br />
orchestrated spectral solver of<br />
unassigned methyls<br />
2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls
28 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
2.1 Abstract<br />
Pseudocontact shifts (PCS) induced by a site-specifically bound paramagnetic lanthanide<br />
ion are shown to provide fast access to sequence-specific resonance assignments of methyl groups<br />
in proteins of known three-dimensional structure. Stereospecific assignments of Val and Leu<br />
methyls are obtained as well as the resonance assignments of all other methyls, including Met CH3<br />
groups. No prior assignments of the diamagnetic protein are required, nor are experiments that<br />
transfer magnetization between the methyl groups and the protein backbone. Methyl Cz-exchange<br />
experiments were designed to provide convenient access to PCS measurements in situations where<br />
a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In the absence of<br />
exchange, simultaneous 13 C-HSQC assignments and PCS measurements are delivered by the newly<br />
developed program Possum. The approaches are demonstrated with the complex between the N-<br />
terminal domain of the subunit and the subunit of the Escherichia coli DNA polymerase III.<br />
2.2 Introduction<br />
Methyl groups are excellent probes for the study of proteins by <strong>NMR</strong> spectroscopy due to<br />
their favorable relaxation properties and intense 1 H <strong>NMR</strong> signals. When buried, they report on the<br />
packing of side chains in the protein core and thus provide important restraints for protein fold<br />
determination (Zwahlen et al., 1998). On the protein surface, they can serve as hydrophobic probes<br />
of protein-protein (Janin et al., 1988, Gross et al., 2003) and protein-ligand (Hajduk et al., 2000)<br />
interactions. Methyl groups have also been established as probes of protein dynamics (Nicholson et<br />
al., 1992, Muhandiram et al., 1995, Wand et al., 1996, Liu et al., 2003, Korzhnev et al., 2004,<br />
Tugarinov et al., 2005a, Tugarinov et al., 2005b) which, in contrast to amide protons, are inert with<br />
regard to solvent exchange.<br />
The resonance assignment of methyl groups in 13 C labeled proteins is usually achieved by<br />
magnetization transfers from sequentially assigned backbone resonances (Montelione et al., 1992).<br />
While this approach works well for proteins up to 30 kDa, it is impeded by fast transverse<br />
relaxation for proteins of high molecular weight or for paramagnetic proteins. Recent advances use<br />
tailored isotope labeling schemes (Tugarinov et al., 2003a, Tugarinov et al., 2003b) which are<br />
expensive and not generally applicable to any type of methyl group. In particular, the methyl<br />
groups of methionine residues are hard to assign since any scalar couplings with the CH3 group<br />
are small (Bax et al., 1994).
2.2 Introduction. 29<br />
As a further drawback, experiments that transfer magnetization between methyl groups and<br />
backbone resonances usually do not afford stereospecific discrimination between the prochiral<br />
methyl groups in Val and Leu residues. In this situation, stereospecific assignments require<br />
additional, stereospecifically labeled samples (Neri et al., 1989, Senn et al., 1989, Ostler et al.,<br />
1993, Kainosho et al., 2006) or more complicated <strong>NMR</strong> experiments that often entail cumbersome<br />
data analysis (Zuiderweg et al., 1985, Sattler et al., 1992, Karimi-Nejad et al., 1994, Tugarinov et<br />
al., 2004, Tang et al., 2005).<br />
In the case where the three-dimensional structure of the protein is known prior to the <strong>NMR</strong><br />
studies, it would be attractive to use the structure to facilitate the <strong>NMR</strong> resonance assignments. In<br />
favorable situations, structure-based resonance assignments can be achieved from NOE data<br />
(Grishaev et al., 2002). In addition, structure-based assignments of backbone resonances have been<br />
achieved using residual dipolar couplings (RDCs) measured with different alignment media (Jung<br />
et al., 2004) or using the combined information from pseudocontact shifts (PCS), RDCs,<br />
paramagnetic relaxation enhancements (PREs) and cross-correlated relaxation (CCR) induced by<br />
paramagnetic metal ions (Pintacuda et al., 2007). The structural interpretation of PCS has been used<br />
earlier to support resonance assignments of ligand residues in heme proteins (Senn et al., 1985).<br />
Recent advances in site-specific attachment of single lanthanide ions to proteins (Ma et al., 2000,<br />
Dvoretsky et al., 2002, Wöhnert et al., 2003, Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et<br />
al., 2005, Haberz et al., 2006, Rodriguez-Castañeda et al., 2006, Su et al., 2006) extend this<br />
approach to long-range paramagnetic effects, with the possibility of tuning the range of focus by<br />
choice of a particular lanthanide (Allegrozzi et al., 2000, Balayssac et al., 2006, Pintacuda et al.,<br />
2007).<br />
Here we show that the analysis of PCS induced by lanthanide ions presents a powerful tool<br />
for the assignment of methyl resonances, which by reference to the 3D structure of the protein,<br />
works even in situations when connectivities to the backbone resonances are difficult to establish or<br />
the backbone resonance assignment is incomplete. Stereospecific assignments of Val and Leu<br />
methyls are obtained as well as the assignments of any other methyl resonances, including those of<br />
Met CH3 groups. We present two Cz-EXSY experiments for the convenient measurement of PCS<br />
in situations where a paramagnetic lanthanide is in exchange with a diamagnetic lanthanide. In<br />
addition, an algorithm was developed to assign the 13 C-HSQC cross-peaks of methyl groups in the<br />
situation where no exchange information is available. The approaches are demonstrated with the 30<br />
kDa complex between the N-terminal exonuclease domain 186 and the subunit of Escherichia
30 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
coli DNA polymerase III. The active site of 186 binds two divalent ions (Hamdan et al., 2002b)<br />
that can be replaced by a single Ln 3+ ion (Pintacuda et al., 2004).<br />
2.3 Experimental section<br />
2.3.1 Sample preparation<br />
A cyclized version of 186, cz- 186, was designed for enhanced stability of the protein in<br />
unrelated crystallographic experiments (Park, 2006). Using an intein-based strategy (Williams et<br />
al., 2002), the N-terminal Ser2 and C-terminal Ala186 of 186 were linked by the nonapeptide<br />
TRESGSIEF (numbered 187-195). Apart from the N- and C-terminal residues that are structurally<br />
disordered in 186 (Hamdan et al., 2002b),the amide proton chemical shifts of the linear protein are<br />
conserved in cz- 186 within ±0.05 ppm, indicating that cyclization does not significantly affect the<br />
protein structure. The proteins cz- 186 and were prepared, and used to isolate samples of the cz-<br />
186/ complex, essentially as described previously (Hamdan et al., 2002a). <strong>NMR</strong> experiments<br />
made use of three different samples of complexes of unlabeled with isotope-labeled cz- 186: (i) a<br />
uniformly 13 C/ 15 N-labeled sample (0.5 mM), (ii) a biosynthetically directed fractional 13 C-labeled<br />
sample prepared from 20% 13 C-glucose (0.5 mM) (Neri et al., 1989, Senn et al., 1989), and (iii) a<br />
sample with 13 C/ 15 N-Leu (0.15 mM). Samples of 186/ were dialyzed against <strong>NMR</strong> buffer (20<br />
mM Tris, pH 7.2, 100 mM NaCl, 0.1 mM dithiothreitol, and 0.08% (w/v) NaN3 in 90% H2O/10%<br />
D2O).<br />
Lanthanides (Ln 3+ = La 3+ or 1:1 mixtures of La 3+ /Dy 3+ or La 3+ /Yb 3+ ) were added from<br />
LnCl3 stock solutions in the same buffer containing total Ln 3+ concentrations of 30 mM. The 1:1<br />
mixtures were added in slight molar excess to catalyze the metal ion exchange, resulting in<br />
exchange rates of a few s –1 (John et al., 2007a, John et al., 2007b).Restoration of the apo-complex<br />
was achieved by extensive dialysis against buffer containing 1 mM EDTA followed by dialysis<br />
against EDTA-free buffer.<br />
2.3.2 <strong>NMR</strong> spectroscopy<br />
All <strong>NMR</strong> experiments were performed at 25 o C on a Bruker AV 800 MHz <strong>NMR</strong><br />
spectrometer equipped with a cryogenic TCI probe. Sequence-specific resonance assignments of
2.3 Experimental section. 31<br />
the methyl groups in the diamagnetic state were established by 3D HNCA and (H)CCH-TOCSY<br />
spectra of the uniformly 13 C/ 15 N labeled sample complexed with 1 equivalent of La 3+ (cz-<br />
186/ /La 3+ ), and by reference to the assignments reported for the linear 186 protein with Mg 2+<br />
(DeRose et al., 2003). Stereospecific assignments of Val and Leu methyl groups were obtained<br />
from a constant-time (28 ms) 13 C-HSQC spectrum recorded of the fractionally 13 C labeled sample.<br />
Where possible, the rotameric states of the side chains of Val and Leu residues in the crystal<br />
structure of 186 (Hamdan et al., 2002b) were confirmed in solution by a 3D NOESY- 15 N-HSQC<br />
spectrum (mixing time 60 ms) recorded of the uniformly 13 C/ 15 N labeled sample.<br />
Sequence-specific resonance assignments of the methyl groups in the paramagnetic state<br />
were established by 2D and 3D methyl Cz-EXSY spectra recorded with the pulse schemes of Figure<br />
2.1 using a mixing period ( m) of 480 ms and spectral widths of 30 ppm ( 13 C) and 16 ppm ( 1 H). The<br />
2D spectra were acquired with 160 × 1024 complex data points and 32 scans in 10 h, while the 3D<br />
spectra were acquired with 80 × 64 × 1024 complex points and 4 scans in 40 h. For all spectra, the<br />
initial t1 delay was set to half the increment so that folded paramagnetic peaks could be identified<br />
by their inverted sign (Bax et al., 1991).<br />
The methyl group assignments obtained with these experiments provided the controls for<br />
the assignment methods described below.<br />
Figure 2.1 Methyl CZ-EXSY experiments. (a, b) Pulse schemes of the 2D and 3D versions,<br />
respectively. Narrow and wide bars represent radiofrequency pulses with flip angles of 90º<br />
and 180º, respectively, applied with phase x unless indicated otherwise. Selective 13 C pulses<br />
were applied as a 1.5 ms Q5 pulse prior to the delay C and as a 1.5 ms time-reversed Q5
32 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
pulse prior to the delay H, generating an excitation bandwidth of 20 ppm. 1 H saturation is<br />
achieved with 120º pulses applied every 5 ms, and the 3-9-19 sequence is used for water<br />
suppression. During m, 180 o ( 1 H) pulses are applied every 6 ms with a MLEV-16 supercycle<br />
to suppress cross relaxation between 1 H and 13 C spins. The phase cycle was 1(x, –x), 2(2x,<br />
2(–x)), 3(x), 4(4y, 4(–y)) and rec(x, 2(–x), x, –x, 2x, –x). States-TPPI was applied to 1 and<br />
3 for quadrature detection. Delays: T = 28 ms, = 3 ms, C = 0.75 ms, H = 1.7 ms.<br />
Gradients (Gi) were applied along the z-axis with strengths of 23.2, 14.5, 20.3, 17.5 and 11.6<br />
G/cm. (c-e) Simulated dipolar 13 C relaxation rates and NOE in isolated CH3 groups versus<br />
molecular rotational correlation time ( R) using eqs 1-4 in ref. 5. (c) Transverse relaxation<br />
rate R2, (d) longitudinal relaxation rate R1, and (e) steady-state 13 C{ 1 H} NOE. The dashed<br />
line reports the relaxation rates calculated for a static CH3 group, whereas the solid line<br />
takes into account a rapid rotation around the three-fold symmetry axis with a correlation<br />
time f = 25 ps and assuming tetrahedral geometry (Sf 2 = 0.111, rCH = 1.10 Å, rCC = 1.52 Å)<br />
(Wand et al., 1996). The dotted line represents the contribution from a neighboring 13 C spin<br />
to R2, R1, and cross-relaxation ( ), respectively. The vertical axis of the 13 C- 13 C cross-<br />
relaxation rate in (e) is in s –1 . Due to the small contribution of PRE to R1 (John et al., 2007a),<br />
the 13 C{ 1 H} NOE is similar for paramagnetic and diamagnetic proteins.<br />
2.3.3 Manual resonance assignments from PCS<br />
The PCS measured from EXSY spectra were used to evaluate the possibility of assigning<br />
the methyl peaks by comparison with back-calculated PCS. PCS were back-calculated using a<br />
Mathematica (Wolfram <strong>Research</strong>) script and the crystal structure of 186 (PDB entry 1J53, ref. 40).<br />
The -tensor parameters of Dy 3+ in complex with 186/ have been reported previously (Schmitz<br />
et al., 2006). The tensor parameters for Yb 3+ were determined from 15 N-HSQC spectra using the<br />
program Echidna (Schmitz et al., 2006) as: ax = –6.52 × 10 –32 m 3 , rh = 1.12 × 10 –32 m 3 , =<br />
24.4º, = 84.5º, and = –299.5º (using the zxz convention of Euler angle rotations). 1 H PCS of<br />
methyl groups were calculated for each of the three methyl protons individually and averaged. This<br />
average is largely insensitive to the rotational position of the methyl group. Residual CSA effects<br />
due to paramagnetic alignment (John et al., 2005) were disregarded since CSA tensors of methyl<br />
groups are small (Liu et al., 2003).<br />
2.3.4 The program Possum
2.3 Experimental section. 33<br />
The program Possum (paramagnetically orchestrated spectral solver of unassigned methyls)<br />
was developed to assign the cross-peaks of methyl groups in correlation spectra recorded with<br />
diamagnetic and paramagnetic metal ions by reference to the 3D structure of the protein and<br />
independently determined tensors. The program requires that the amino-acid type is known (e.g.<br />
by residue-type selective 13 C-labeling). Furthermore, it can accept information about methyl cross-<br />
peaks belonging to the same residue (―methyl connectivity‖ data for Ile, Leu, and Val, as provided<br />
by HCCH-TOCSY experiments) and stereospecific information (―methyl specificity‖ data<br />
distinguishing between 2 and 1 cross-peaks of Ile, 1 and 2 cross-peaks of Leu, and 1 and 2<br />
cross-peaks of Val, as provided by samples produced with biosynthetically directed fractional 13 C-<br />
labeling (Neri et al., 1989, Senn et al., 1989, Tugarinov et al., 2004) or stereoselective isotope<br />
labeling (Ostler et al., 1993, Kainosho et al., 2006)). In the present version of the program, the<br />
methyl connectivity information is always assumed to be available for the diamagnetic state.<br />
The program takes as input the 1 H and 13 C chemical shifts of methyl groups measured in<br />
13 C-HSQC spectra and the 13 C chemical shifts of methyl groups that are too close to the<br />
paramagnetic center to be directly observable in 1 H-detected <strong>NMR</strong> spectra. By comparing the<br />
chemical shifts in the diamagnetic and paramagnetic states, Possum attempts to find the resonance<br />
assignment with the lowest residual cost C(l) defined by:<br />
with:<br />
subject to:<br />
(2.1)<br />
(2.2)
34 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
(2.3)<br />
(2.4)<br />
where PCS S calc(k,l) is the predicted PCS value for the spin S (S = 13 C or 1 H) of the methyl<br />
group in residue k arising from the paramagnetism of the lanthanide l, para δ S exp(j,l) is the chemical<br />
shift of the resonance j in the presence of the lanthanide l for the spin S, and dia δ S exp(i) is the<br />
diamagnetic resonance i for the spin S. The cost function assigns smaller costs for deviations<br />
between calculated and observed PCS when the experimentally observed PCS is large, while the<br />
constant e(l) prevents a singularity in the cost function and accounts for the error in measurements<br />
when a spin experiences small paramagnetic effects far away from the paramagnetic center. e(l)<br />
scales with the magnitude of the tensor of the lanthanide l. Empirically determined values<br />
(e(Yb 3+ ) = 1/6 ppm and e(Dy 3+ ) = 1 ppm) were used here 2 . Equations (2.3) and (2.4) ensure that<br />
each calculated PCS and each experimental chemical shift are chosen exactly once within the<br />
global assignment.<br />
Equations (1.1), (2.3) and (2.4) present the formulation of the three-index assignment<br />
problem (Schell, 1955) which is the three-dimensional instance of the multidimensional assignment<br />
problem (MAP). With D being the number of dimensions of the MAP (D = 3 in the example above)<br />
and n being the size of each of the D sets of data, there are (n!) D-1 possible assignments. When D is<br />
strictly larger than 2, MAP has been proven to be NP-hard (Karp, 1972) and, as a result, no<br />
algorithm can guarantee the best solution to the problem in a polynomial time. An exhaustive<br />
2 The purpose of e(l) is to avoid degenerate cases where the experimental PCS is very close to zero.<br />
Its value is not critical for the success of the algorithm. E(l) has however been optimized to yield<br />
best possible results.
2.3 Experimental section. 35<br />
search through the (n!) 2 possibilities is impracticable for even the smallest problem sizes. An exact<br />
branch and bound algorithm that explores only a part of all possible assignments has been proposed<br />
(Balas et al., 1991) and works well for small problem sizes, especially when there is a good<br />
agreement between predicted and observed PCS. In the present context, a simulated annealing<br />
optimization scheme proved more efficient computationally. The dimensionality D of the<br />
assignment problem generated by Possum depends on the residue type, the availability of<br />
connectivity information, and the number of different lanthanides used. We have performed<br />
calculations with up to 6 dimensions. Examples of 3- and 4-dimensional problems are illustrated in<br />
Figure 2.2.<br />
Figure 2.2 Formulation of the assignment problem depending on the information available. The<br />
columns dia δ S exp and para δ S exp contain the chemical shifts (S = 13 C and S = 1 H as observed for 13 C-<br />
HSQC cross-peaks) measured in the presence of a diamagnetic or paramagnetic lanthanide,<br />
respectively. The column marked PCS S calc contains the 13 C and 1 H PCS calculated from the<br />
tensor and the 3D structure of the protein. (a) Assignment problem for residues with a single<br />
methyl group (Ala, Met, Thr). The indices i and j refer to the cross-peak number in the diamagnetic<br />
and paramagnetic state, respectively, and the index k is the residue number in the amino-acid<br />
sequence, as in equation (1.1). The assignment (i = 1, j = 3, k = 1) is illustrated by connecting<br />
lines. The associated cost can be calculated using equation (2.2). The other n-1 assignments<br />
necessary to calculate the total cost C(l) according to equation (1.1) are not shown. Overall, this<br />
assignment problem is three-dimensional. (b) Simultaneous use of the information from two<br />
samples containing the paramagnetic lanthanides l1 or l2 creates a four-dimensional assignment<br />
problem. (c) For amino acids with two methyl groups (Ile, Leu, Val), the columns dia δ S exp, para δ S exp,<br />
and PCS S calc embed the chemical shifts (and PCS) of two methyl groups (m1 and m2). If the methyl-<br />
specificity information is not available in the paramagnetic state (illustrated by m? in the column
36 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
para δ S exp), Possum will compute the two possible costs and only keep the lower one. (d) For Ile, Leu,<br />
and Val residues, the methyl-methyl connectivity information may be available in the diamagnetic<br />
state but not in the paramagnetic state. This situation creates a four-dimensional assignment<br />
problem for data from a single paramagnetic lanthanide.<br />
The program also takes into account the absence of paramagnetic peaks due to PRE by<br />
preventing the assignment of observable paramagnetic peaks to methyl groups located closer to the<br />
metal ion than a user-specified cutoff. In the present work, cutoffs of 6 and 9 Å were used for the<br />
Yb 3+ and Dy 3+ complexes, respectively. Paramagnetic peaks missing for any other reason (e.g.<br />
spectral overlap) are also tolerated. This is achieved by assigning a cost only to pairings of<br />
observable paramagnetic and diamagnetic peaks, whereas a zero cost is associated with any<br />
unassigned diamagnetic peak left over. Finally, the program allows for the possibility that<br />
paramagnetic shifts may have been observed only for either the 13 C or the 1 H resonance of a methyl<br />
group.<br />
The calculation of the assignment starting from the chemical shifts of Table S2.1 took less<br />
than 2 h on an AMD 64 4200+ processor, when using all available information, including methyl<br />
connectivity and methyl specificity information and the chemical shifts from the Yb 3+ and Dy 3+<br />
complexes.<br />
2.4 Results<br />
2.4.1 13 C-HSQC spectra of the cz- 186/ /Ln 3+ complexes<br />
Constant-time 13 C-HSQC spectra of the uniformly 13 C labeled diamagnetic cz- 186/ /La 3+<br />
complex and the paramagnetic cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+ complexes illustrate the spectral<br />
complexity of the methyl region and the effect of the paramagnetism. The spectrum of the cz-<br />
186/ /La 3+ complex (blue peaks in Figure 2.3) contains approximately the number of methyl peaks<br />
expected for 19 Ala, 14 Thr, 6 Met, 12 Val, 17 Leu, and 14 Ile residues (125 methyl groups). The<br />
signals of Met CH3 groups are particularly well resolved and easily identified as they appear with<br />
opposite sign.
2.4 Results. 37<br />
Figure 2.3 Methyl region of constant-time 13 C-HSQC spectra of the cz- 186/ complex<br />
(containing 13 C/ 15 N labeled cz- 186) in the presence of La 3+ (blue) and a 1:1 mixture of (a)<br />
La 3+ /Dy 3+ and (b) La 3+ /Yb 3+ (red). Met CH3 and CH2 groups appear with inverted sign<br />
(light colors). The spectra were recorded using a constant time of 28 ms and t2max = 160 ms.<br />
The spectra of the mixed samples were acquired with 4 times as many scans to compensate<br />
for the halving of the effective concentrations.<br />
As Dy 3+ is one of the strongest paramagnetic lanthanide ions (Pintacuda et al., 2007), the<br />
methyl peaks of the cz- 186/ /Dy 3+ complex (red peaks in Figure 2.3a) are strongly shifted by PCS<br />
and affected by 1 H line broadening due to transverse paramagnetic relaxation enhancement (PRE).<br />
Thus, only 55 cross-peaks are observable corresponding to methyl groups with a distance from the<br />
Dy 3+ ion larger than 15 Å, many of them with intensities close to the noise level. Part of the 1 H line<br />
broadening is caused by unresolved RDCs, including intra-methyl RDCs (Kaikkonen et al., 2001),<br />
originating from the paramagnetically induced alignment of the protein with the magnetic field.<br />
In the cz- 186/ /Yb 3+ complex, the cutoff distance is reduced to about 9 Å due to the about<br />
6 times smaller paramagnetic moment of Yb 3+ so that only 10 methyl peaks are expected to be<br />
broadened beyond detection. Of the remaining 115 methyl resonances (red peaks in Figure 2.3b),<br />
only 14 peaks could not be analyzed due to overlap or very small PCS at larger distances from the<br />
metal ion. Figure 2.3 shows that for both paramagnetic lanthanides, it is nearly impossible to trace<br />
the paramagnetic shift of a 13 C-HSQC peak using the criterion that the PCS in the 13 C and 1 H<br />
dimensions of the spectrum must be similar. (In methyl groups, the distance between the carbon<br />
and the average position of the three protons is less than 0.4 Å.) Therefore, without prior<br />
knowledge of resonance assignments, PCS measurements cannot be made manually from 13 C-<br />
HSQC spectra alone.
38 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
2.4.2 Methyl CZ-EXSY experiments<br />
In order to measure PCS data conveniently and with high sensitivity, we designed an<br />
experiment applicable to samples prepared with a 1:1 mixture of paramagnetic and diamagnetic<br />
metal ions, where chemical exchange between the metal ions leads to exchange of the protein<br />
between paramagnetic and diamagnetic states. By generating exchange cross-peaks between methyl<br />
peaks of the diamagnetic and paramagnetic lanthanide complexes, the experiment allows the<br />
measurement of 1 H and 13 C PCS from a single spectrum. Figure 2.1 shows 2D and 3D versions of<br />
the methyl Cz-EXSY experiment. The pulse sequences are related to previously published Nz-<br />
exchange experiments (Farrow et al., 1994, John et al., 2007a).<br />
During a mixing period m, magnetization is stored as relatively slowly relaxing CZ<br />
magnetization. Simulations indicate that, owing to rapid rotation around the 3-fold symmetry axis,<br />
longitudinal relaxation rates R1 of methyl 13 C spins are fairly insensitive with respect to molecular<br />
size, and barely exceed 2 s –1 even for very small proteins (Figure 2.1d). In the cz- 186/ /La 3+<br />
complex ( C = 17 ns), we measured R1( 13 C) rates of about 1.6 s –1 for the majority of methyl groups.<br />
Only a group of highly mobile Thr residues relaxed somewhat faster (2 s –1 ), whereas the R1( 13 C)<br />
relaxation in Met CH3 groups was much slower (about 0.7 s –1 ). In contrast to transverse relaxation<br />
rates R2, R1 rates in macromolecules are barely affected by the paramagnetism of lanthanides(John<br />
et al., 2007a).<br />
The experiment yields auto-peaks for the diamagnetic and paramagnetic states (dd and pp<br />
peaks, respectively) and exchange peaks arising from magnetization exchange from the<br />
paramagnetic to the diamagnetic state and vice versa (pd and dp peaks, respectively).<br />
Since the experiment starts from 13 C polarization rather than using an INEPT transfer, pd<br />
peaks can be detected even for methyl groups that are strongly affected by 1 H PRE in the<br />
paramagnetic state and thus invisible in the 13 C-HSQC spectrum. Combined with the dd peaks, this<br />
allows 13 C PCS measurements that are limited only by the (16-fold smaller) 13 C PRE (John et al.,<br />
2007b). As indicated previously 13 and illustrated by the simulations of Figure 2.1e, 13 C polarization<br />
in methyl groups of proteins can be very efficiently enhanced using the { 1 H} 13 C NOE. This holds<br />
irrespective of paramagnetism. We observed an about two-fold increase in 13 C polarization in the<br />
cz- 186/ /La 3+ complex using 1 s of 1 H irradiation between subsequent scans.<br />
For improved resolution in the 13 C dimension and measurement of small 13 C PCS, the 2D<br />
experiment is implemented as a constant-time experiment in the t1 dimension. The 3D experiment<br />
additionally records the 13 C frequency of the protein state after the mixing time. Real-time
2.4 Results. 39<br />
evolution periods in both indirect dimensions yield superior sensitivity for residues with substantial<br />
13 C PRE that commonly also have larger PCS. Selective 13 C pulses select the spectral window of<br />
the methyl 13 C resonances of the diamagnetic complex in order to limit the spectral width required<br />
in the F1 dimension.<br />
2.4.3 Resonance assignment of Met, Ala and Thr methyl groups<br />
Figure 2.4a shows the spectral region of the Met CH3 cross-peaks of the 2D methyl Cz-<br />
exchange spectrum, recorded with a sample of cz- 186/ containing La 3+ and Dy 3+ in a 1:1 ratio.<br />
Out of six Met residues, four are observed in the 13 C-HSQC spectrum of the cz- 186/ /Dy 3+<br />
complex (the cross-peak of Met178 in the paramagnetic state appears with very weak intensity at<br />
2( 1 H) = 5.33 ppm). For these residues, both auto and both exchange peaks become visible in the<br />
exchange spectrum, forming a rectangle that allows straightforward identification of dd-pp peak<br />
pairs, yielding 13 C PCS of 3.28, 1.31, 0.49 and –0.65 ppm. A fifth residue only yields a pd<br />
exchange peak with a 13 C PCS value of –3.39 ppm.<br />
Figure 2.4 Assignment of Met CH3 from PCS. (a) 2D methyl Cz-EXSY spectrum of cz- 186/<br />
(containing 13 C/ 15 N-labeled cz- 186) loaded with a 1:1 mixture of La 3+ and Dy 3+ (red), overlaid<br />
with the 13 C-HSQC spectrum (blue). The diamagnetic auto-peaks (dd) are labeled with the<br />
assignment and connected to the paramagnetic auto-peaks (pp) and the exchange peaks (pd and<br />
dp) with dashed rectangles. The dp and pp peaks of Met178 are outside the selected spectral region
40 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
at 2 = 5.53 ppm. Met107 only shows a pd exchange peak (vertical dashed line), and neither pp-<br />
peak nor exchange peaks are observed for Met18. The spectrum was recorded with a mixing time<br />
of 480 ms. (b) Comparison of predicted (top) and measured (bottom) PCS of Met CH3 groups. 13 C<br />
PCS and 1 H PCS are plotted with filled and open bars, respectively, and sorted according to the<br />
predicted 13 C PCS. The distances rC-Ln are given in Å in the center. 3<br />
The measured PCS can be compared with values predicted from the known structure of<br />
186 (Park, 2006) and the previously determined tensor of Dy 3+ (Figure 2.4b) (Schmitz et al.,<br />
2006). Only five Met residues belong to the structured part of the protein with predicted 13 C PCS of<br />
3.74 (Met178), 1.30 (Met137), –0.56 (Met87), –1.68 (Met18) and –2.87 ppm (Met107). Met185 is<br />
located in the flexible cyclizing loop of cz- 186 and can be immediately assigned to the very<br />
intense and narrow resonance with a 13 C PCS of 0.49 ppm, in agreement with the PCS of 0.53 ppm<br />
observed for the amide proton of this residue. Met18 is the residue closest to the metal ion (rC-Dy =<br />
12.0 Å) and can be assigned to the methyl group that does not show any exchange peak. As Met18<br />
lines the active site this assignment is independently confirmed by its sensitivity to titration with<br />
nucleotides (unpublished results). Met107 is the second closest residue (rC-Dy = 14.2 Å) and<br />
displays a pd but no dp exchange peak; the assignment of all other Met residues follows in a<br />
straightforward manner from the PCS data.<br />
The data show that it is possible to assign a limited number of methyl groups using PCS<br />
only. The situation is more complex for the methyl groups of the other amino acids since with the<br />
exception of Ile CH3 groups, the amino acid type cannot be identified from 13 C-HSQC spectra<br />
alone. This information would have to be provided either by the use of residue-specific labeling or<br />
additional <strong>NMR</strong> experiments (in the cz- 186/ /La 3+ complex, the amino acid type can readily be<br />
identified from a 3D (H)CCH-TOCSY spectrum). In addition, important information is provided by<br />
(i) the relative size of 13 C and 1 H PCS and (ii) whether the paramagnetic 1 H resonance can be<br />
observed (rC-Dy > 15 Å) or only pd exchange peaks (rC-Dy > 10 Å, Figure S2.4 and Figure S2.5).<br />
3 Experimental PCS have an error below 0.1 ppm. Errors in the calculated PCS depend on the<br />
quality of the 3D structures used. Residues 87, 107, 137 and 178 belong to structured part. The<br />
error on their calculated PCS can be considered below 10% of their absolute value.
2.4.4 Assignments of Val, Leu, and Ile methyl groups<br />
2.4 Results. 41<br />
Val, Leu, and Ile are amino acids with two methyl groups that can easily be linked by<br />
correlations observed in TOCSY spectra; combining the PCS data for both methyl groups greatly<br />
facilitates the resonance assignment of these residues. This is illustrated in Figure 2.5 with the cz-<br />
186/ complex containing 1:1 mixtures of Yb 3+ /La 3+ (a, b) and Dy 3+ /La 3+ (c, d), respectively.<br />
Whereas a 2D (H)C(C)H-TOCSY experiment (Figure S2.1, Supporting Information) recorded with<br />
short mixing time (12 ms) strongly favors intra-residual methyl-methyl correlations (Figure 2.5a)<br />
(Eaton et al., 1990), the 2D methyl CZ-EXSY spectrum yields predominantly exchange peaks and<br />
only weak 1- 2 correlations arising from 13 C- 13 C NOE (Fischer et al., 1996). We have also applied<br />
the (H)C(C)H-TOCSY experiment to a sample of the pure cz- 186/ /La 3+ complex containing<br />
selectively 13 C/ 15 N-Leu labeled cz- 186, where all 1- 2 methyl pairs could be identified (Figure<br />
S2.4).
42 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Figure 2.5 PCS measurements in isopropyl groups of Val and Leu and use of PCS for<br />
stereospecific resonance assignments. (a) Selected spectral region from a 2D (H)C(C)H TOCSY<br />
spectrum (red) of cz- 186/ loaded with a 1:1 mixture of La 3+ and Yb 3+ , showing the methyl cross-<br />
peaks of Val96. The spectrum is overlaid with the 13 C-HSQC spectrum (blue). Intraresidual<br />
correlations between the cross-peaks of the 1CH3 and 2CH3 groups are identified by dotted lines.<br />
The TOCSY spectrum was recorded with 12 ms mixing time. (b) Same spectral region as in (a)<br />
taken from the 2D methyl Cz-EXSY spectrum of the same sample recorded with a mixing time of<br />
480 ms. (c) Selected strips from the 3D methyl CZ-EXSY spectrum of cz- 186/ loaded with a 1:1<br />
mixture of La 3+ and Dy 3+ (right panels) aligned with corresponding spectral regions from the 2D<br />
methyl CZ-EXSY spectrum (left panels). The strips display the methyl group correlations of Val10<br />
and Leu113. The arrows point from the chemical shifts of the diamagnetic auto-peaks (dd) to the<br />
chemical shifts of the exchange peaks (pd), indicating the 13 C PCS. Horizontal lines identify the
2.4 Results. 43<br />
positions of the dd- and pd-peaks in the 1( 13 C) dimension. The line at 26.35 ppm identifies the 13 C-<br />
13 C NOEs with the CH group of Leu113. (d) Assignment of methyl resonances from the<br />
comparison of predicted with experimental PCS. For each Val residue, the distances from the<br />
lanthanide in Å are indicated for both methyl carbons in the center of the plot. 13 C PCS and 1 H<br />
PCS are displayed as filled and open bars, respectively. The residues are sorted according to the<br />
predicted 13 C-PCS and the PCS are plotted in the sequence C 1 /H 1 /C 2 /H 2 .<br />
Figure 2.5c compares the measurement of 13 C PCS for two residues from strips of 2D and<br />
3D methyl CZ-EXSY spectra. The two experiments are complementary, showing better frequency<br />
resolution in the 2D spectrum and generally less cross-peak overlap in the 3D spectrum. The<br />
example of Leu113 shows that one- and two-bond 13 C- 13 C NOE correlations are visible, but<br />
generally of much smaller intensity than the exchange peaks. Through-bond correlations can again<br />
be identified from TOCSY spectra. From the combined use of the 2D and 3D Cz-EXSY spectra, all<br />
13 C-HSQC cross-peaks observable for any of the methyl groups of the cz- 186/ /Dy 3+ and cz-<br />
186/ /Yb 3+ complexes could readily be correlated with the corresponding 13 C-HSQC cross-peaks<br />
of the cz- 186/ /La 3+ complex, yielding the PCS. Compared to the 13 C-HSQC spectrum, the methyl<br />
Cz-EXSY spectra yielded the 13 C chemical shifts in the paramagnetic state for a further 47 methyl<br />
groups of the cz- 186/ /Dy 3+ complex with rC-Dy distances as short as 10 Å, leaving only 11 methyl<br />
groups completely unobservable due to excessive PRE. For the cz- 186/ /Yb 3+ complex, the<br />
methyl Cz-EXSY spectra yielded the 13 C chemical shifts for 7 additional methyl groups with rC-Yb<br />
distances as short as 6 Å, leaving only 1 methyl group unobservable.<br />
Figure 2.5d compares the predicted and measured PCS for Val methyl groups in the cz-<br />
186/ /Dy 3+ complex. Each residue is characterized by up to 4 PCS values, resulting in the<br />
straightforward assignment of 10 out of 12 residues. Only Val39 did not yield unambiguous PCS<br />
data due to resonance overlap, and Val65 is too close to the Dy 3+ ion. The assignment of Val65<br />
could be made in the 186/ /Yb 3+ complex, where this residue yields large PCS (Supporting<br />
Information). The methyl cross-peaks of Leu residues can be assigned in an analogous way (see<br />
below).<br />
Importantly, this approach automatically yields the stereospecific assignment of Val and<br />
Leu methyl peaks, as long as different PCS are observed for the two prochiral methyl groups. Since<br />
the methyl carbons in an isopropyl group are separated by 2.5 Å, this is almost always the case
44 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
(Figure 2.5d). A rare exception is Val50, where the predicted 13 C and 1 H PCS of the 1 and 2<br />
methyl groups are indistinguishable in both the Dy 3+ and Yb 3+ complex.<br />
The methyl groups of Ile residues are particularly easy to assign by PCS, since the spectral<br />
ranges of the 13 C-<strong>NMR</strong> signals of 2 and 1 methyl groups are clearly separated, while intra-<br />
residual methyl-methyl connectivity can still be obtained from TOCSY spectra.<br />
2.4.5 Automatic assignments without EXSY data<br />
Cz-EXSY spectra provide an exceptionally simple way of measuring PCS. For situations<br />
where the metal exchange is too slow for exchange spectra and spectral crowding prevents the<br />
straightforward pairing between diamagnetic and paramagnetic 13 C-HSQC peaks (Figure 2.3), we<br />
have devised the program Possum which determines the correct peak pairings, their resonance<br />
assignments, and their PCS, using the 3D structure of the protein and the tensor (that can readily<br />
be obtained from, e.g. , 15 N-HSQC spectra (Pintacuda et al., 2004, Schmitz et al., 2006)).<br />
The performance of the program was initially tested with simulated data, replacing the<br />
experimental paramagnetic shifts of Table S2.1 by shifts back-calculated from the crystal structure<br />
of 186 (Hamdan et al., 2002b) and using the crystal structure, the experimental diamagnetic<br />
chemical shifts, and the tensors of Dy 3+ and Yb 3+ as input. In all calculations, it was assumed<br />
that the residue types of all methyl resonances were known and the methyl connectivity information<br />
of Val, Leu, and Ile residues was available for the diamagnetic state. Except for extreme cases of<br />
spectral overlap, the program yielded 100% correct assignments. In a second step, structural<br />
uncertainties were simulated by randomly displacing the methyl groups, following a Maxwell-<br />
Boltzmann distribution with maxima at 0.35 and 0.7 Å (resulting in maximal atom displacements of<br />
0.75 and 1.5 Å, respectively, always using the same direction of displacement). Even in the case<br />
with the maximum structural noise, using only paramagnetic data from the Yb 3+ complex and<br />
neither methyl specificity nor methyl connectivity information in the paramagnetic state, Possum<br />
yielded >75% correct assignments of the diamagnetic methyl resonances (Table S2.2 and Table<br />
S2.4). The score increased to >90% when paramagnetic data from the Dy 3+ complex, methyl<br />
specificity information in all complexes, and methyl connectivity information in the Yb 3+ complex<br />
(but not the Dy 3+ complex) were included (Table S2.2 and Table S2.3).<br />
The program was subsequently applied to the experimental data of the methyl groups of cz-<br />
186/ loaded with La 3+ , Yb 3+ and Dy 3+ . Table 2.1 summarizes the results. Using both<br />
paramagnetic lanthanides, the assignment is complete and correct for all diamagnetic 13 C-HSQC
2.4 Results. 45<br />
cross-peaks that have observable paramagnetic partners. The only exceptions are swapped<br />
assignments for Met18 and Met107 and Val65 and Val82. The first arises from a side-chain<br />
conformation that is different in solution than in the single crystal and the second from differences<br />
in the predicted and experimental PCS observed for the peptide segment near Val65 (John et al.,<br />
2007b).The assignments of the methyl groups of the Yb 3+ complex are similarly reliable, whereas<br />
the methyl signals of the Dy 3+ complex are harder to assign (in the absence of methyl connectivity<br />
information). Using only data from the Yb 3+ complex and omitting any methyl specificity<br />
information or connectivity information in the paramagnetic state still results in >70% correct<br />
assignments of the diamagnetic methyl resonances (Table S2.2 and Table S2.4) 4 .<br />
Table 2.1 Automatic assignment of methyl groups by the program Possum a<br />
Residue type Occurrence b La c observable Yb d assigned Dy d assigned La e assigned<br />
Met 6 (1) 5 3/5 4/4 3/5<br />
Thr 14 (4) 8 7/7 7/7 7/7<br />
Ala 19 (2) 17 13/13 11/13 14/14<br />
Ile 14 (2) 24 24/24 21/23 24/24<br />
Val 12 (0) 24 20/20 17/20 18/22<br />
Leu 17 (0) 34 34/34 19/25 34/34<br />
a Obtained using the data reported in Table S2.1, the crystal structure of 186 (Hamdan et al.,<br />
2002b) and tensors determined from 15 N-HSQC spectra as described in the experimental<br />
section. The paramagnetic data measured with Yb 3+ and Dy 3+ were combined to derive the<br />
assignments.<br />
b Total number of residues in cz- 186. The number in brackets refers to residues not observed in<br />
the crystal structure; these were excluded from the calculation.<br />
c Number of methyl groups with coordinates reported in the crystal structure for which cross-peaks<br />
were observed in the cz- 186/ /Ln 3+ sample. Their unassigned chemical shifts were available for<br />
the program.<br />
4 It is the responsibility of the user to inspect and optimize the assignment provided by Possum; in<br />
particular to untangle areas of the spectrum with overlapping peaks.
46 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
d Fraction of correct assignments for the paramagnetic cz- 186/ /Yb 3+ and cz- 186/ /Dy 3+<br />
complexes, as indicated. The number in the denominator is the number of methyl groups for which<br />
cross-peaks were observed in the presence of Yb 3+ or Dy 3+ .<br />
e Fraction of correct assignments for the diamagnetic cz- 186/ /La 3+ complex. The number in the<br />
denominator is the number of methyl groups for which cross-peaks were observed in at least one of<br />
the paramagnetic complexes.<br />
2.4.6 PCS and flexibility<br />
Structural differences between the crystal structure of 186 determined under cryogenic<br />
conditions (Hamdan et al., 2002b) and the solution structure of the cz- 186/ complex become<br />
apparent as differences between measured and predicted PCS. In a few cases, the structural<br />
differences interfere with the resonance assignment. Figure 2.6a illustrates the situation for the cz-<br />
186/ /Dy 3+ complex, where the measured PCS of Leu161 are smaller than predicted and would<br />
more closely match the values predicted for Leu131. This can be explained by a small displacement<br />
of the peptide segment comprising residues 151-161 that decreases the PCS of both methyls of<br />
Leu161. Smaller PCS than expected were also observed for the backbone amides of this segment. 45<br />
The correct assignment would be obtained by focusing on the difference in 13 C PCS between both<br />
methyl groups rather than their magnitude (Figure 2.6a) or by using the data of the cz- 186/ /Yb 3+<br />
complex which are less strongly distance dependent in the 11 Å distance range (Figure 2.6b).
2.4 Results. 47<br />
Figure 2.6 Residues showing deviations between predicted and experimental PCS. Comparison of<br />
calculated and experimental PCS of Leu131, Leu95, Leu161, Leu11 and Ile154 in the cz-<br />
186/ /Dy 3+ complex. The data are plotted in the sequence C 1 /H 1 /C 2 /H 2 and C 2 /H 2 /C 1 /H 1 / for<br />
the Leu residues and Ile154, respectively. (b) Same as (a), but for the cz- 186/ /Yb 3+ complex. (c)<br />
Predicted 13 C PCS of the prochiral methyl groups of Val82 in cz- 186/ /Dy 3+ versus sidechain<br />
dihedral angle. The values predicted from the crystal structure of 186 40 are marked. (d) Same as<br />
(c), but for the CH3 groups of Leu95.<br />
In the cases of Val82 and Leu95 in the Dy 3+ complex, the comparison of experimental and<br />
predicted 13 C-PCS data yields the wrong stereospecific assignment. The 1 and 2 angles of these<br />
residues are –47º and 172º, respectively, in the crystal structure (Hamdan et al., 2002b). Adjusting<br />
these angles to –60º and 180º, respectively, inverts the relative size of the 13 C PCS predicted for the<br />
two methyl groups, leads to much better agreement between predicted and experimental PCS, and<br />
results in the correct stereospecific assignments (Figure 2.6c and d). This observation is most<br />
simply explained by a small difference between the crystal and solution structures. Note that the
48 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
correct assignment could have been obtained for Leu95 in the Yb 3+ complex (Figure 2.6b). None of<br />
the other Val and Leu residues swapped their stereospecific assignment when we changed their 1<br />
angles ( 2 angles in the case of Leu) by ±10º.<br />
In the case of Leu11, different <strong>NMR</strong> criteria suggest that its side chain undergoes dynamic<br />
conformational averaging. (i) The 13 C-PCS values predicted from the structure are -7.0 and -11.9<br />
ppm, whereas the experimental value found for both methyl groups is -8.4 ppm. (ii) In the crystal<br />
structure, the two methyl groups are 9.2 and 11.1 Å from the metal ion, but the 13 C-<strong>NMR</strong> line<br />
widths observed for the methyl groups in the cz- 186/ /Dy 3+ complex are indistinguishable. (iii)<br />
Both methyl resonances overlap with each other in the 13 C-HSQC spectrum, indicating similar<br />
chemical environments, and their line shapes are narrower than those of most other methyl groups.<br />
Remarkably, however, Leu11 is located in the hydrophobic core of the protein and is very well<br />
defined in the crystal structure, 40 although the side chain forms no steric contacts and could access<br />
different rotameric states without introducing van der Waals violations with neighboring atoms.<br />
Conceivably, the low temperature used in the X-ray experiment may have frozen out a single<br />
conformation, whereas a much larger conformational space is accessible at room temperature.<br />
Ile154 presents an example where partial motional averaging may be indicated by a smaller<br />
difference observed between the PCS of the 1 and 2 carbon atoms than predicted. The side chain<br />
heavy atoms of this residue shows enhanced B-factors in the crystal structure, in agreement with its<br />
location at the protein surface.<br />
2.5 Discussion<br />
The present work shows that methyl resonances of 13 C-labeled proteins can be assigned<br />
solely from PCS with reference to the 3D structure of the protein, yielding both sequence- and<br />
stereo-specific resonance assignments without having to establish connectivities to backbone<br />
resonances. This presents a significant advance over our previous strategy for the assignment of<br />
15 N-HSQC spectra, which relied on PCS, PRE, CCR, and RDCs measured on selectively labeled<br />
samples (Pintacuda et al., 2004).<br />
Clearly, any resonance assignment based on comparison of experimental and back-<br />
calculated PCS critically depends on the accuracy of the 3D structure of the protein and is expected<br />
to fail for flexible protein segments. Yet, this problem is much less severe than in the case of RDCs
2.5 Discussion. 49<br />
(Sibille et al., 2002), since PCS are far less affected by local mobility as long as the spins are not<br />
very close to the paramagnetic center. The robustness of PCS with regard to structural variations is<br />
particularly beneficial for the assignment of Met CH3 groups that are notoriously difficult to<br />
assign by conventional methods. The potential of PCS for their assignment has been noted<br />
previously (Bose-Basu et al., 2004).<br />
The assignment strategy presented here requires the determination of the tensor, which<br />
can readily be achieved from 15 N- 1 H correlation spectra by the Platypus algorithm (Pintacuda et al.,<br />
2004). Obtaining resonance assignments of methyl groups in this way is attractive because 15 N- 1 H<br />
correlation spectra of backbone amides and 13 C- 1 H correlation spectra of methyl groups can be<br />
recorded even for high-molecular weight systems (Fiaux et al., 2002, Sprangers et al., 2007).<br />
Alternatively, the -tensor parameters can be determined from assigned diamagnetic <strong>NMR</strong><br />
resonances and a set of PCS identified by comparison with the paramagnetic <strong>NMR</strong> spectrum, either<br />
manually or automatically using the Echidna algorithm (Schmitz et al., 2006). Initial sequence-<br />
specific resonance assignments can, if necessary, be achieved by site-directed mutagenesis (Siivari<br />
et al., 1995, Bose-Basu et al., 2004), for example by mutation of Ile to Val (Wu et al., 2007).<br />
Assignments by PCS are not limited to metal-binding proteins as different techniques have<br />
recently become available that achieve site-specific attachment of lanthanide-tags to proteins<br />
devoid of natural metal binding sites (Ma et al., 2000, Dvoretsky et al., 2002, Wöhnert et al., 2003,<br />
Ikegami et al., 2004, Prudêncio et al., 2004, Leonov et al., 2005, Haberz et al., 2006, Rodriguez-<br />
Castañeda et al., 2006, Su et al., 2006). The use of different tags or attachment at different sites<br />
readily generates very different tensors (Rodriguez-Castañeda et al., 2006) that can highlight<br />
inconsistencies between experimental and back-calculated PCS.<br />
If the exchange between paramagnetic and diamagnetic metal ions is too slow to measure<br />
exchange spectra, the program Possum can be used to assign the methyl groups in the diamagnetic<br />
and paramagnetic state. As expected, the robustness of Possum with regard to small differences<br />
between the atomic coordinates of the protein and its actual structure in solution increases with the<br />
amount of additional data available. In this respect, data from two paramagnetic metal ions are<br />
particularly beneficial, but also information about intraresidual methyl-methyl connectivities or<br />
stereospecific identities of methyl groups in Val, Leu, and Ile residues.<br />
The robustness of assignments made by Possum can further be enhanced by the increased<br />
spectral resolution afforded by 3D <strong>NMR</strong> spectra which would greatly facilitate the identification of<br />
the corresponding <strong>NMR</strong> resonances in the diamagnetic and paramagnetic state based on the
50 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
criterion that all correlated spins are close in space and therefore experience similar PCS. For<br />
example, 3D (H)CCH-TOCSY or NOESY- 13 C-HSQC spectra would resolve several cross-peaks<br />
for each methyl group, which can simultaneously be compared with the 3D structure of the protein<br />
and the predicted PCS to obtain resonance assignments. For methyl groups in the vicinity of the<br />
paramagnetic ion, the observation of correlations can be aided by protonless experiments (Bermel<br />
et al., 2006).<br />
Conceivably, assignments by PCS can also be achieved for perdeuterated proteins of<br />
increased molecular weight containing selectively protonated methyl groups (Rosen et al., 1996).<br />
The best spectral resolution in the methyl region of the 13 C-HSQC spectrum would be obtained for<br />
CD2H groups (Kainosho et al., 2006).Notably, however, the Cz-EXSY experiments described here<br />
allowed us to measure all PCS data in the uniformly 13 C/ 15 N-labeled and fully protonated sample,<br />
i.e. the improved spectral resolution of selectively labeled samples was not necessary for our<br />
system.<br />
In conclusion, resonance assignments of the 13 C-HSQC cross-peaks of methyl groups by<br />
PCS induced by a site-specifically attached lanthanide ion present a versatile and convenient<br />
technique which can open many opportunities for <strong>NMR</strong> studies of proteins of known three-<br />
dimensional structure. It is anticipated that resonance assignments by this technique will be<br />
particularly useful in ligand screening applications.<br />
2.6 Acknowledgement<br />
The authors thank Don A. Grundel for source codes of the MAP solver and for useful<br />
discussions. M.J. thanks the Humboldt Foundation for a Feodor-Lynen Fellowship. Financial<br />
support from the Australian <strong>Research</strong> Council for project grants, a Federation Fellowship for G.O.<br />
and the 800 MHz <strong>NMR</strong> spectrometer at the ANU is gratefully acknowledged. This work was<br />
supported by an award under the Merit Allocation Scheme of the National Facility of the Australian<br />
Partnership for Advanced Computing.<br />
2.7 Supporting Information Available
2.8 References. 51<br />
Pulse scheme of a (H)C(C)H-TOCSY experiment for correlations between isopropyl methyl<br />
groups, 13 C-HSQC spectra of uniformly, fractionally, and selectively isotope labeled cz- 186/ ,<br />
diagrams comparing experimental and predicted PCS, a table with the chemical shifts of the methyl<br />
groups cz- 186 observed in the presence of La 3+ , Yb 3+ , or Dy 3+ , and tables reporting the number of<br />
methyl groups assigned by Possum. This material is available free of charge via the Internet at<br />
http://pubs.acs.org.<br />
2.8 References<br />
Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />
pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />
Å from the metal ion. J Am Chem Soc 122:4154-4161<br />
Balas E and Saltzman MJ (1991) An algorithm for the 3-index assignment problem. Oper Res<br />
39:150-161<br />
Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />
complete aminoacid identification in thulium substituted Calbindin D9K. J Biomol <strong>NMR</strong><br />
34:63-73<br />
Bax A, Delaglio F, Grzesiek S and Vuister GW (1994) Resonance assignment of methionine<br />
methyl groups and χ 3 angular information from long-range proton-carbon and carbon-<br />
carbon J correlation in a calmodulin-peptide complex. J Biomol <strong>NMR</strong> 4:787-797<br />
Bax A, Ikura M, Kay LE and Zhu G (1991) Removal of F1 baseline distortion and optimization of<br />
folding in multidimensional <strong>NMR</strong> spectra. J Magn Reson 91:174-178<br />
Bermel W, Bertini I, Felli IC, Piccioli M and Pierattelli R (2006) 13 C-detected protonless <strong>NMR</strong><br />
spectroscopy of proteins in solution. Prog <strong>NMR</strong> Spectrosc 48:25-45<br />
Bose-Basu B, DeRose EF, Kirby TW, Mueller GA, Beard WA, Wilson SH and London RE (2004)<br />
Dynamic characterization of a DNA repair enzyme: <strong>NMR</strong> studies of [methyl-<br />
13 C]methionine-labeled DNA polymerase β. Biochemistry 43:8911-8922<br />
DeRose EF, Darden T, Harvey S, Gabel S, Perrino FW, Schaaper RM and London RE (2003)<br />
Elucidation of the ε- θ subunit interface of Escherichia coli DNA polymerase III by <strong>NMR</strong><br />
spectroscopy. Biochemistry 42:3635-3644<br />
Dvoretsky A, Gaponenko V and Rosevear PR (2002) Derivation of structural restraints using a<br />
thiol-reactive chelator. FEBS Lett 528:189-192
52 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Eaton HL, Fesik SW, Glaser SJ and Drobny GP (1990) Time dependence of 13 C- 13 C magnetization<br />
transfer in isotropic mixing experiments involving amino acid spin systems. J Magn Reson<br />
90:452-463<br />
Farrow NA, Zhang O, Forman-Kay JD and Kay LE (1994) A heteronuclear correlation experiment<br />
for simultaneous determination of 15 N longitudinal decay and chemical exchange rates of<br />
systems in slow equilibrium. J Biomol <strong>NMR</strong> 4:727-734<br />
Fiaux J, Bertelsen EB, Horwich AL and Wüthrich K (2002) <strong>NMR</strong> analysis of a 900K GroEL-<br />
GroES complex. Nature 418:207-211<br />
Fischer MWF, Zeng L and Zuiderweg ERP (1996) Use of 13 C- 13 C NOE for the assignment of <strong>NMR</strong><br />
lines of larger labeled proteins at larger magnetic fields. J Am Chem Soc 118:12457-12458<br />
Grishaev A and Llinás M (2002) CLOUDS, a protocol for deriving a molecular proton density via<br />
<strong>NMR</strong>. Proc Natl Acad Sci U S A 99:6707-6712<br />
Gross JD, Gelev VM and Wagner G (2003) A sensitive and robust method for obtaining<br />
intermolecular NOEs between side chains in large protein complexes. J Biomol <strong>NMR</strong><br />
25:235-242<br />
Haberz P, Rodriguez-Castañeda F, Junker J, Becker S, Leonov A and Griesinger C (2006) Two<br />
new chiral EDTA-based metal chelates for weak alignment of proteins in solution. Org Lett<br />
8:1275-1278<br />
Hajduk PJ, Augeri DJ, Mack J, Mendoza R, Yang J, Betz SF and Fesik SW (2000) <strong>NMR</strong>-based<br />
screening of proteins containing 13 C-labeled methyl groups. J Am Chem Soc 122:7898-<br />
7904<br />
Hamdan S, Bulloch EM, Thompson PR, Beck JL, Yang JY, Crowther JA, Lilley PE, Carr PD, Ollis<br />
DL, Brown SE and Dixon NE (2002a) Hydrolysis of the 5 '-p-nitrophenyl ester of TMP by<br />
the proofreading exonuclease (ε) subunit of Escherichia coli DNA polymerase III.<br />
Biochemistry 41:5266-5275<br />
Hamdan S, Carr PD, Brown SE, Ollis DL and Dixon NE (2002b) Structural basis for proofreading<br />
during replication of the Escherichia coli chromosome. Structure 10:535-546<br />
Ikegami T, Verdier L, Sakhaii P, Grimme S, Pescatore B, Saxena K, Fiebig KM and Griesinger C<br />
(2004) Novel techniques for weak alignment of proteins in solution using chemical tags<br />
coordinating lanthanide ions. J Biomol <strong>NMR</strong> 29:339-349<br />
Janin J, Miller S and Chothia C (1988) Surface, subunit interfaces and interior of oligomeric<br />
proteins. J Mol Biol 204:155-164<br />
John M, Headlam MJ, Dixon NE and Otting G (2007a) Assignment of paramagnetic 15 N-HSQC<br />
spectra by heteronuclear exchange spectroscopy. J Biomol <strong>NMR</strong> 37:43-51
2.8 References. 53<br />
John M, Park AY, Dixon NE and Otting G (2007b) <strong>NMR</strong> detection of protein 15 N spins near<br />
paramagnetic lanthanide ions. J Am Chem Soc 129:462-463<br />
John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />
proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />
shifts. J Am Chem Soc 127:17190-17191<br />
Jung YS and Zweckstetter M (2004) Backbone assignment of proteins with known structure using<br />
residual dipolar couplings. J Biomol <strong>NMR</strong> 30:25-35<br />
Kaikkonen A and Otting G (2001) Residual dipolar 1 H- 1 H couplings of methyl groups in weakly<br />
aligned proteins. J Am Chem Soc 123:1770-1771<br />
Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Ono AM and Güntert P (2006) Optimal isotope<br />
labelling for <strong>NMR</strong> protein structure determinations. Nature 440:52-57<br />
Karimi-Nejad Y, Schmidt JM, Rüterjans H, Schwalbe H and Griesinger C (1994) Conformations of<br />
valine side chains in ribonuclease T1 determined by <strong>NMR</strong> studies of homonuclear and<br />
heteronuclear 3 J coupling constants. Biochemistry 33:5481-5492<br />
Karp RM (1972) Reducibility Among Combinatorial Problems. Complexity of Computer<br />
Computations. New York: Plenum, R. E. Miller and J. W. Thatcher.<br />
Korzhnev DM, Kloiber K, Kanelis V, Tugarinov V and Kay LE (2004) Probing slow dynamics in<br />
high molecular weight proteins by methyl-TROSY <strong>NMR</strong> spectroscopy: Application to a<br />
723-residue enzyme. J Am Chem Soc 126:3964-3973<br />
Leonov A, Voigt B, Rodriguez-Castañeda F, Sakhaii P and Griesinger C (2005) Convenient<br />
synthesis of multifunctional EDTA-based chiral metal chelates substituted with an S-<br />
mesylcysteine. Chem Eur J 11:3342-3348<br />
Liu W, Zheng Y, Cistola DP and Yang D (2003) Measurement of methyl 13 C- 1 H cross-correlation<br />
in uniformly 13 C-, 15 N-, labeled proteins. J Biomol <strong>NMR</strong> 27:351-364<br />
Ma C and Opella SJ (2000) Lanthanide ions bind specifically to an added "EF-hand" and orient a<br />
membrane protein in micelles for solution <strong>NMR</strong> spectroscopy. J Magn Reson 146:381-384<br />
Montelione GT, Lyons BA, Emerson SD and Tashiro M (1992) An efficient triple resonance<br />
experiment using carbon-13 isotropic mixing for determining sequence-specific resonance<br />
assignments of isotopically-enriched proteins. J Am Chem Soc 114:10974-10975<br />
Muhandiram DR, Yamazaki T, Sykes BD and Kay LE (1995) Measurement of 2 H T1 and T1ρ<br />
relaxation times in uniformly 13 C-labeled and fractionally 2 H-labeled proteins in solution. J<br />
Am Chem Soc 117:11536-11544<br />
Neri D, Szyperski T, Otting G, Senn H and Wüthrich K (1989) Stereospecific nuclear magnetic<br />
resonance assignments of the methyl groups of valine and leucine in the DNA-binding
54 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
domain of the 434 repressor by biosynthetically directed fractional 13 C labeling.<br />
Biochemistry 28:7510-7516<br />
Nicholson LK, Kay LE, Baldisseri DM, Arango J, Young PE, Bax A and Torchia DA (1992)<br />
Dynamics of methyl groups in proteins as studied by proton-detected 13 C <strong>NMR</strong><br />
spectroscopy. Application to the leucine residues of staphylococcal nuclease. Biochemistry<br />
31:5253-5263<br />
Ostler G, Soteriou A, Moody CM, Khan JA, Birdsall B, Carr MD, Young DW and Feeney J (1993)<br />
Stereospecific assignments of the leucine methyl resonances in the 1 H <strong>NMR</strong> spectrum of<br />
Lactobacillus casei dihydrofolate reductase. FEBS Lett 318:177-180<br />
Park AY (2006) Ph.D. <strong>Thesis</strong>. Australian National University, Australia.<br />
Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />
complexes by lanthanide labeling. Acc Chem Res 40:206-212<br />
Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />
assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />
Chem Soc 126:2963-2970<br />
Prudêncio M, Rohovec J, Peters JA, Tocheva E, Boulanger MJ, Murphy MEP, Hupkes HJ, Kosters<br />
W, Impagliazzo A and Ubbink M (2004) A caged lanthanide complex as a paramagnetic<br />
shift agent for protein <strong>NMR</strong>. Chem Eur J 10:3252-3260<br />
Rodriguez-Castañeda F, Haberz P, Leonov A and Griesinger C (2006) Paramagnetic tagging of<br />
diamagnetic proteins for solution <strong>NMR</strong>. Magn Reson Chem 44:S10-S16<br />
Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T and Kay LE (1996) Selective methyl<br />
group protonation of perdeuterated proteins. J Mol Biol 263:627-636<br />
Sattler M, Schwalbe H and Griesinger C (1992) Stereospecific assignment of leucine methyl groups<br />
with 13 C in natural abundance or with random 13 C labeling. J Am Chem Soc 114:1126-1127<br />
Schell E (1955) Distribution of a product over several properties. E 2nd Sym. Linear Program 615–<br />
642<br />
Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />
tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />
87<br />
Senn H, Werner B, Messerle BA, Weber C, Traber R and Wüthrich K (1989) Stereospecific<br />
assignment of the methyl 1 H <strong>NMR</strong> lines of valine and leucine in polypeptides by<br />
nonrandom 13 C labelling. FEBS Lett 249:113-118
2.8 References. 55<br />
Senn H and Wüthrich K (1985) Amino-acid-sequence, hem-hiron coordination geometry and<br />
functional-properties of mitochondrial and bacterial c-type cytochromes. Quart Rev<br />
Biophys 18:111-134<br />
Sibille N, Bersch B, Covès J, Blackledge M and Brutscher B (2002) Side chain orientation from<br />
methyl 1 H- 1 H residual dipolar couplings measured in highly deuterated proteins. J Am<br />
Chem Soc 124:14616-14625<br />
Siivari K, Zhang M, Palmer AG and Vogel HJ (1995) <strong>NMR</strong> studies of the methionine methyl<br />
groups in calmodulin. FEBS Lett 366:104-108<br />
Sprangers R and Kay LE (2007) Quantitative dynamics and binding studies of the 20S proteasome<br />
by <strong>NMR</strong>. Nature 445:618-622<br />
Su XC, Huber T, Dixon NE and Otting G (2006) Site-specific labelling of proteins with a rigid<br />
lanthanide-binding tag. Chembiochem 7:1599-1604<br />
Tang C, Iwahara J and Clore GM (2005) Accurate determination of leucine and valine side-chain<br />
conformations using U-[ 15 N/ 13 C/ 2 H]/[ 1 H-(methine/methyl)-Leu/Val] isotope labeling, NOE<br />
pattern recognition, and methine Cγ-Hγ /Cβ-Hβ residual dipolar couplings: application to<br />
the 34-kDa enzyme IIA Chitobiose . J Biomol <strong>NMR</strong> 33:105-121<br />
Tugarinov V and Kay LE (2003a) Ile, Leu, and Val methyl assignments of the 723-residue malate<br />
synthase G using a new labeling strategy and novel <strong>NMR</strong> methods. J Am Chem Soc<br />
125:13868-13878<br />
Tugarinov V and Kay LE (2003b) Side chain assignments of Ile δ1 methyl groups in high<br />
molecular weight proteins: An application to a 46 ns tumbling molecule. J Am Chem Soc<br />
125:5701-5706<br />
Tugarinov V and Kay LE (2004) Stereospecific <strong>NMR</strong> assignments of prochiral methyls, rotameric<br />
states and dynamics of valine residues in malate synthase G. J Am Chem Soc 126:9827-<br />
9836<br />
Tugarinov V and Kay LE (2005a) Methyl groups as probes of structure and dynamics in <strong>NMR</strong><br />
studies of high-molecular-weight proteins. Chembiochem 6:1567-+<br />
Tugarinov V, Ollerenshaw JE and Kay LE (2005b) Probing side-chain dynamics in high molecular<br />
weight proteins by deuterium <strong>NMR</strong> spin relaxation: An application to an 82-kDa enzyme. J<br />
Am Chem Soc 127:8214-8225<br />
Wand AJ, Urbauer JL, McEvoy RP and Bieber RJ (1996) Internal dynamics of human ubiquitin<br />
revealed by 13 C-relaxation studies of randomly fractionally labeled protein. Biochemistry<br />
35:6116-6125
56 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Williams NK, Prosselkov P, Liepinsh E, Line I, Sharipo A, Littler DR, Curmi PMG, Otting G and<br />
Dixon NE (2002) In vivo protein cyclization promoted by a circularly permuted<br />
Synechocystis sp. PCC6803 DnaB mini-intein. J Biol Chem 277:7790-7798<br />
Wöhnert J, Franz KJ, Nitz M, Imperiali B and Schwalbe H (2003) Protein alignment by a<br />
coexpressed lanthanide-binding tag for the measurement of residual dipolar couplings. J Am<br />
Chem Soc 125:13338-13339<br />
Wu PSC, Ozawa K, Lim SP, Vasudevan SG, Dixon NE and Otting G (2007) Cell-free<br />
transcription/translation from PCR-amplified DNA for high-throughput <strong>NMR</strong> studies.<br />
Angew Chem, Int Ed 46:3356-3358<br />
Zuiderweg ERP, Boelens R and Kaptein R (1985) Stereospecific assignments of 1 H-<strong>NMR</strong> methyl<br />
lines and conformation of valyl residues in the lac repressor headpiece. Biopolymers<br />
24:601-611<br />
Zwahlen C, Gardner KH, Sarma SP, Horita DA, Byrd RA and Kay LE (1998) An <strong>NMR</strong> experiment<br />
for measuring methyl-methyl NOEs in 13 C-labeled proteins with high resolution. J Am<br />
Chem Soc 120:7617-7625<br />
2.9 Supporting information<br />
Figure S2.1 Pulse scheme of the 2D (H)C(C)H-TOCSY experiment<br />
used in this study. Parameters are as for the pulse schemes of Figure<br />
2.1. Efficient magnetization transfer between the methyl groups of<br />
isopropyl groups was obtained by applying DIPSI3 mixing for 12 ms<br />
with a radiofrequency amplitude of 8.6 kHz. The Bruker pulse<br />
programs of this pulse sequence and of the pulse sequences of Figure<br />
2.1 can be downloaded from http://rsc.anu.edu.au/~go/.
2.9 Supporting information. 57<br />
Figure S2.2 Assigned constant-time (28 ms) 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex<br />
( 13 C/ 15 N labeled cz- 186) at pH 7.2 and 25 o C. Only the region containing the methyl cross-peaks is<br />
shown. Cross-peaks from methyl groups of Val, Leu, Ile, Ala and Thr appear as positive peaks<br />
(blue), whereas cross-peaks from Met CH3 and all CH2 groups appear as negative peaks (red).
58 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Figure S2.3 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex, where cz-<br />
186 was biosynthetically fractionally 13 C-labeled using 20% uniformly 13 C-labeled glucose.<br />
Parameters and plot region as in Figure S2.2. Cross-peaks from Val 1, Leu 1, and Ala methyl<br />
groups are positive (blue). Cross-peaks from Val 2, Leu 2, Thr 2 and Met methyl groups are<br />
negative (red). Cross-peaks from Ile 1 and 2 methyl groups are mostly invisible due to<br />
scrambling of 13 C during Ile biosynthesis.
2.9 Supporting information. 59<br />
Figure S2.4 Assigned constant-time 13 C-HSQC spectrum of the cz- 186/ /La 3+ complex containing<br />
13 C/ 15 N-Leu labeled cz- 186 (blue) superimposed onto a 2D (H)C(C)H-TOCSY spectrum of the<br />
same sample (red). The assignments of the 13 C-HSQC cross-peaks are indicated. The three mobile<br />
residues Leu11, Leu43 and Leu145 also show one-bond correlations between CH3 and CH<br />
groups.
60 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Figure S2.5 Comparisons of calculated and experimental PCS in the cz- 186/ /Dy 3+ complex for<br />
methyl groups of (a) Met, (b) Ala, (c) Thr, (d) Val, (e) Leu, and (f) Ile. The distances rC-Ln are<br />
indicated in Å at the top of each plot. For residues with two methyl groups, the distance value<br />
shown at the top refers to the C 1 (Val), C 1 (Leu), or C 1 (Ile) atom.
Figure S2.5 continued<br />
2.9 Supporting information. 61
62 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Figure S2.6 Comparisons of calculated and experimental 13 C and 1 H PCS as in Figure S2.5 but for<br />
the cz- 186/ /Yb 3+ complex.
Figure S2.6 continued<br />
2.9 Supporting information. 63
64 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
Table S2.1 13 C and 1 H chemical shifts (ppm) of methyl groups of cz- 186 in the cz- 186/ /Ln 3+<br />
complexes used in this study a<br />
<strong>Group</strong> rC-Ln cz- 186/ /La 3+ cz- 186/ /Dy 3+ cz- 186/ /Yb 3+<br />
Methionine<br />
(Å)<br />
13 C<br />
1 H<br />
M18 12.0 17.78 2.06 18.04 2.34<br />
M87 22.5 15.97 1.53 15.32 0.93 16.05 1.63<br />
M107 14.2 16.23 2.04 12.84 16.77 2.64<br />
M137 20.7 16.69 2.05 18.00 3.38 16.41 1.78<br />
M178 15.4 15.39 2.22 18.67 5.53 14.69 1.53<br />
M185 16.83 2.07 17.32 2.55 16.75 1.98<br />
Alanine<br />
A4 19.14 1.35<br />
13 C<br />
A23 20.2 17.15 0.62 17.35<br />
A35 11.5 24.14 1.57 17.70 25.47 2.79<br />
A62 10.9 18.72 1.55 30.88 16.97 -0.24<br />
A69 16.2 23.77 1.61 26.74 23.15<br />
A80 22.3 18.68 1.46 19.04 1.84 18.66 1.39<br />
A83 20.1 19.35 1.29 18.97 1.00<br />
A93 19.2 20.74 1.21 19.78 0.22 20.90 1.39<br />
A100 13.3 19.85 1.41 16.19 21.22 1.77<br />
A101 15.1 18.20 1.45 15.48 -1.16 18.58 1.82<br />
A132 17.6 17.89 1.60 17.16<br />
A134 15.8 18.44 1.79 20.16 3.58 17.96 1.30<br />
A147 14.8 18.49 1.28 20.54 18.01 0.86<br />
A150 17.2 17.57 1.42 20.22 3.91 17.12 0.99<br />
A164 5.2 18.93 1.21<br />
A168 7.5 17.24 1.42 18.52<br />
A172 12.5 18.54 1.38 24.10 17.77 0.73<br />
A177 18.5 16.75 0.92 20.11 4.39 16.17 0.34<br />
A186 19.07 1.39 19.45 1.77 19.02 1.32<br />
1 H<br />
13 C<br />
1 H
Threonine<br />
T3 2<br />
T6 2 21.50 1.20 21.53 1.25<br />
T13 2 8.0<br />
T15 2 9.0 19.45 -0.13 34.77 17.36<br />
2.9 Supporting information. 65<br />
T16 2 13.8 23.64 1.28 31.65 22.35 0.04<br />
T44 2 21.0 22.65 1.34 22.64 1.35 22.67 1.39<br />
T78 2 21.1 22.40 1.42 23.44 2.48 22.19 1.21<br />
T121 2 17.4 21.07 0.66 19.34 -1.07 21.31 0.83<br />
T123 2 25.0 21.48 1.09 20.80 0.50 21.66 1.26<br />
T128 2 16.9 21.93 1.10<br />
T160 2 12.9<br />
T179 2 19.6 20.56 0.66 22.15 2.28 20.25 0.35<br />
T183 2 21.44 1.21 22.16 1.91 21.33 1.08<br />
T187 2 21.57 1.19 21.88 1.51 21.50 1.13<br />
Valine<br />
V10 1 10.4 22.52 0.91 28.21 21.70 -0.07<br />
V10 2 12.8 19.56 0.66 22.13 19.19 0.31<br />
V36 1 11.2 21.16 0.75<br />
V36 2 12.3 21.09 0.67 21.97<br />
V38 1 18.5 22.09 0.83 23.20 2.03 21.92 0.67<br />
V38 2 16.2 20.58 0.82 22.08 2.42 20.43 0.64<br />
V39 1 24.1 21.08 0.89 21.14 1.00<br />
V39 2 22.0 22.03 0.95 22.06 1.01<br />
V50 1 14.9 22.32 0.58 19.95 22.70 0.94<br />
V50 2 13.2 20.00 0.68 17.80 20.26 0.97<br />
V58 1 12.7 21.64 1.29 30.67 20.19 -0.24<br />
V58 2 13.5 22.31 1.18 30.02 21.00 -0.16<br />
V65 1 6.4 20.72 0.90 22.51<br />
V65 2 8.8 21.14 0.93 21.74 1.65
66 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
V82 1 17.7 20.81 0.82 20.09 0.08<br />
V82 2 16.6 20.37 0.72 19.79 0.10<br />
V96 1 13.3 19.60 0.23 21.73 19.08 -0.32<br />
V96 2 15.0 19.04 0.29 19.83 1.04 18.87 0.13<br />
V127 1 15.8 21.62 0.78 19.64 -1.32 21.84 1.05<br />
V127 2 17.2 20.59 0.72 19.02 -0.93 20.89 0.95<br />
V133 1 17.9 21.22 0.97 22.43 2.24 20.92 0.67<br />
V133 2 18.4 21.99 1.16 22.69 1.76 21.77 0.95<br />
V174 1 13.3 22.88 0.95 31.40 21.37 -0.49<br />
V174 2 12.5 24.96 0.97 35.55 23.21 -0.83<br />
Leucine<br />
L11 1 11.1 27.14 1.00 18.89 28.30 2.40<br />
L11 2 9.2 26.90 0.99 18.65 28.30 2.39<br />
L43 1 16.1 25.49 0.96 27.78 3.35 25.24 0.69<br />
L43 2 15.6 25.52 0.94 27.63 3.15 25.26 0.72<br />
L52 1 13.6 26.18 0.65 25.88 0.34<br />
L52 2 15.0 24.99 0.56 25.93 24.81 0.39<br />
L57 1 21.4 25.94 0.72 28.07 2.74 25.61 0.39<br />
L57 2 20.4 21.82 0.85 24.23 3.21 21.40 0.44<br />
L73 1 13.8 25.49 0.96 24.36 -0.30<br />
L73 2 13.4 21.30 0.97 25.98 20.34 -0.02<br />
L74 1 22.4 26.03 0.99 27.27 2.17 25.78 0.75<br />
L74 2 21.8 22.44 0.85 23.95 2.39 22.12 0.55<br />
L95 1 15.2 25.21 0.75 22.60 25.64 1.20<br />
L95 2 14.6 23.20 0.78 20.84 -1.73 23.72 1.21<br />
L113 1 20.2 24.42 0.34 25.63 1.59 24.22 0.15<br />
L113 2 22.3 21.37 0.64 22.17 1.45 21.27 0.53<br />
L114 1 21.6 26.00 1.22 26.00 1.24<br />
L114 2 22.0 22.35 1.01 22.33 0.95<br />
L131 1 13.3 22.74 0.79 21.14 22.69 0.70<br />
L131 2 12.5 25.78 0.71 21.89 26.09 1.07
L145 1 8.9 23.96 0.70 16.25<br />
L145 2 6.5 24.01 0.62 14.34<br />
2.9 Supporting information. 67<br />
L148 1 13.2 26.21 0.92 31.96 25.03 -0.32<br />
L148 2 15.4 23.33 1.14 27.21 22.59 0.42<br />
L161 1 11.3 24.89 0.72 23.66 25.39 1.18<br />
L161 2 10.6 21.78 0.83 19.26 22.39 1.42<br />
L165 1 10.1 24.05 0.89 17.02 25.60 2.34<br />
L165 2 11.0 26.53 1.03 19.74 27.83 2.26<br />
L166 1 8.8 22.89 0.74<br />
L166 2 7.7 24.99 0.86 26.66<br />
L171 1 10.1 21.06 0.82 36.23 18.23 -2.00<br />
L171 2 8.9 26.65 0.99 23.97<br />
L176 1 17.6 24.89 0.72 27.48 3.25 24.51 0.36<br />
L176 2 18.5 21.98 0.70 24.08 2.64 21.66 0.40<br />
Isoleucine<br />
I5 2 17.94 0.95 17.86 0.87 17.96 0.97<br />
I5 1 12.76 0.82 12.66 0.71<br />
I9 2 13.9 18.78 0.77 16.02 19.39 1.35<br />
I9 1 16.6 10.64 0.31 9.09 -1.20 11.00 0.64<br />
I21 2 23.8 17.24 0.85 17.29 0.91 17.24 0.87<br />
I21 1 24.3 12.79 0.85 12.86 0.93<br />
I30 2 10.4 17.96 0.82 23.37 17.09 -0.06<br />
I30 1 12.0 13.35 0.64 15.00 13.15 0.55<br />
I31 2 11.9 18.45 0.75 28.85 16.71 -0.92<br />
I31 1 9.8 14.65 0.34 33.74 11.31 -2.95<br />
I33 2 10.6 17.24 0.71 8.89 18.77 2.26<br />
I33 1 12.2 12.24 0.08 9.89 12.65 0.45<br />
I68 2 11.4 19.25 0.79 28.19 17.44 -0.91<br />
I68 1 8.4 13.87 0.56 9.07<br />
I90 2 15.6 18.42 0.41 15.93 -2.28 18.93 0.92<br />
I90 1 16.7 12.90 0.53 10.88 -1.52
68 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
I97 2 9.1 18.43 1.01 8.34 19.69 2.37<br />
I97 1 11.4 14.99 0.88 9.50 15.78 1.68<br />
I104 2 16.0 17.84 0.90 15.86 -2.05 18.13 1.19<br />
I104 1 14.4 10.34 0.71 7.50 10.70 1.14<br />
I118 2 22.6 15.96 0.66 15.47 0.17 16.02 0.75<br />
I118 1 23.0 13.16 0.86 12.89 0.58 13.23 0.91<br />
I154 2 13.7 16.96 0.76 23.43 15.97 -0.23<br />
I154 1 16.3 11.78 0.73 17.48 6.40 10.94 -0.16<br />
I170 2 9.5 18.41 0.71 40.65 15.06 -2.55<br />
I170 1 8.6 12.88 0.68 32.52 10.26 -1.98<br />
I193 2 17.46 0.81 17.55 0.91 17.42 0.78<br />
I193 1 13.04 0.83 13.15 0.94 13.00 0.80<br />
a Conditions: 25 ºC, pH 7.2. The chemical shifts in the cz- 186/ /La 3+ complex were measured<br />
from 13 C-HSQC spectra of the sample containing 13 C/ 15 N labeled cz- 186 in the presence of 1<br />
equivalent La 3+ . Whenever possible, chemical shifts of the cz- 186/ /Dy 3+ and cz- 186/ /Yb 3+<br />
complexes were measured from 13 C-HSQC spectra of samples prepared with 1:1 mixtures of La 3+<br />
and Dy 3+ , or La 3+ and Yb 3+ , respectively. 13 C chemical shifts of methyl groups for which no 1 H<br />
chemical shift is reported were measured from the pd exchange peaks in 2D or 3D methyl Cz-EXSY<br />
spectra, whichever gave better resolution. When neither 13 C nor 1 H chemical shifts are indicated,<br />
the expected cross-peak could not be identified either because of spectral overlap (e.g. in the case<br />
of vanishing PCS) or strong PRE.
2.9 Supporting information. 69<br />
Table S2.2 Number of correctly assigned methyl groups of Met, Thr, and Ala residues of cz- 186<br />
using the program Possum a<br />
a Calculations were performed using the experimental data of Table S2.1 and simulated data,<br />
where the paramagnetic chemical shifts of Table S2.1 were replaced by chemical shifts back-<br />
calculated from the crystal structure of 186 and the tensors used in the present study. Two<br />
additional sets of simulated data were generated by addition of structural noise to the PDB<br />
coordinates of 186. The structural noise followed a Gaussian distribution of 0.25 and 0.5 Å<br />
standard deviation, resulting in a Maxwell-Boltzmann distribution of atomic displacements with<br />
maxima at 0.35 and 0.7 Å, respectively. The columns marked “Dy max”, “Yb max”, and “La max”<br />
report the number of methyl groups for which data in the paramagnetic state were available to the<br />
program. (Additional peaks observed in the diamagnetic state remained unassigned.) The results<br />
are reported for calculations where the diamagnetic chemical shifts were supplemented only with<br />
data from Dy 3+ (light yellow), Yb 3+ (light blue) or both (grey). The rows marked with the % symbol
70 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
display the percentage of correctly assigned methyl groups for all three residues. The program<br />
Possum is available from http://compbio.chemistry.uq.edu.au/bmmg/christophe.
2.9 Supporting information. 71<br />
Table S2.3 Number of correctly assigned methyl groups of Val, Leu, and Ile residues of cz- 186<br />
using the program Possum with methyl connectivity information in the Yb 3+ complex a
72 Chapter 2. Possum: paramagnetically orchestrated spectral solver of unassigned methyls.<br />
a Calculations were performed using the experimental data of Table S2.1 and simulated data as<br />
described in the footnote of Table S2.2. As each Val, Leu and Ile residue contains two methyl<br />
groups, methyl specificity and methyl connectivity information can be used as additional<br />
information to support the resonance assignment. (Methyl specificity information refers to<br />
stereospecific assignments of the methyl groups of Val and Leu and the a priori distinction of 2<br />
and 1 methyl groups of Ile. Methyl connectivity information refers to the knowledge of which peaks<br />
arise from the same residue.) The results of four different combinations are shown, with and<br />
without methyl specificity information in the paramagnetic complexes, and with and without methyl<br />
specificity information in the diamagnetic complex. It was assumed that no methyl connectivity<br />
information can be established for the Dy 3+ complex because of strong PRE. The data are<br />
presented in the same format as in Table S2.2. Assignments were counted as correct whenever a<br />
methyl cross-peak was assigned to the correct residue, disregarding the stereospecific correctness<br />
of the assignment. Note that the maximum number of assignable methyl groups reported in the<br />
column marked “La max” can vary when both Dy 3+ and Yb 3+ data are used, because Possum has<br />
the freedom not to assign every HSQC cross-peak observed for the Dy 3+ complex to a peak<br />
observed for the Yb 3+ complex. This results in a small variation of the number of residues for which<br />
the program has paramagnetic information available and can attempt an assignment of the<br />
diamagnetic data.
2.9 Supporting information. 73<br />
Table S2.4 Number of correctly assigned methyl groups of valine, leucine, and isoleucine residues<br />
of cz- 186 using the program Possum without methyl connectivity information in the Yb 3+ complex<br />
a<br />
a The data are presented as in Table S2.3.
Chapter 3<br />
Numbat: new user-friendly<br />
method built for automatic Δχ-<br />
tensor determination<br />
3. Numbat: new user-friendly method built for automatic Δχ-tensor determination
76 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
3.1 Abstract<br />
Pseudocontact shift (PCS) effects induced by a paramagnetic lanthanide bound to a protein<br />
have become increasingly popular in <strong>NMR</strong> spectroscopy as they yield a complementary set of<br />
orientational and long-range structural restraints. PCS are a manifestation of the χ-tensor<br />
anisotropy, the Δχ-tensor, which in turn can be determined from the PCS. Once the Δχ-tensor has<br />
been determined, PCS become powerful long-range restraints for the study of protein structure and<br />
protein-ligand complexes. Here we present the newly developed package Numbat (New User-<br />
friendly Method Built for Automatic Δχ-Tensor determination). With a Graphical User Interface<br />
(GUI) that allows a high degree of interactivity, Numbat is specifically designed for the<br />
computation of the complete set of Δχ-tensor parameters (including shape, location and orientation<br />
with respect to the protein) from a set of experimentally measured PCS and the protein structure<br />
coordinates. Use of the program is illustrated by building a model of the complex between the E.<br />
coli DNA polymerase III subunits ε186 and θ using PCS.<br />
3.2 Keywords<br />
paramagnetic <strong>NMR</strong> · pseudocontact shift · magnetic susceptibility tensor · software ·<br />
program · unique tensor representation<br />
3.3 Abbreviations<br />
α Subunit α of the E. coli polymerase III<br />
ε186 N-terminal 185 residues of the E. coli polymerase III subunit ε<br />
θ Subunit θ of the E. coli polymerase III<br />
CSA Chemical shielding anisotropy<br />
GUI Graphical user interface<br />
HOT The bacteriophage P1-encoded homolog of θ<br />
PCS Pseudocontact shift<br />
RACS Residual anisotropic chemical shift<br />
RDC Residual dipolar coupling<br />
UTR Unique Δχ-tensor representation
3.4 Introduction<br />
3.4 Introduction. 77<br />
Paramagnetic lanthanide ions bound to the natural metal-binding site of a metalloprotein or<br />
introduced via a lanthanide tag provide a number of paramagnetic effects that can be distance<br />
dependent (i.e. paramagnetic relaxation enhancement), orientation dependent (i.e. residual dipolar<br />
couplings, RDC), or a combination of both, like cross-correlated relaxation effects and<br />
pseudocontact shifts (PCS;(Bertini et al., 2002, Pintacuda et al., 2004)). PCS present particularly<br />
valuable structural restraints, as they are easy to measure and provide long-range information that<br />
would be difficult to obtain by other techniques. PCS originate from unpaired electron spins which<br />
lead to an anisotropic magnetic susceptibility tensor (χ-tensor). PCS restraints induced by<br />
lanthanide ions have been used to investigate structural and dynamical properties of proteins<br />
(Allegrozzi et al., 2000, Bertini et al., 2001, Bertini et al., 2004, Gaponenko et al., 2004, Jensen et<br />
al., 2006, Eichmüller et al., 2007, Wang et al., 2007) and protein-ligand complexes (John et al.,<br />
2006, Pintacuda et al., 2007).<br />
In order to apply PCS restraints, eight variables have to be determined. These comprise the<br />
lanthanide position (three Cartesian coordinates), three angles (e.g. Euler angles) that relate the<br />
molecular frame to the χ-tensor frame, and the axial and rhombic anisotropy parameters of the χ-<br />
tensor. (Since PCS depend only on the χ-tensor anisotropy Δχ rather than the absolute magnitude of<br />
the χ-tensor, it is sufficient to determine the anisotropy parameters represented by the Δχ-tensor.)<br />
Several integrated software tools are available for the determination and study of the alignment<br />
tensor using RDCs (Dosset et al., 2000, Zweckstetter et al., 2000, Valafar et al., 2004, Wei et al.,<br />
2006). For the situation where the 3D structure of the protein is known a priori, corresponding<br />
tools for the determination of the Δχ-tensor from PCS have been developed but are more limited in<br />
scope. The program Fantasia (Banci et al., 1996) and its extension Fantasian (Banci et al., 1997)<br />
can fit the magnitude and Euler angles of the Δχ-tensor using a set of experimental PCS but<br />
requires prior knowledge of the metal coordinates. The program Platypus (Pintacuda et al., 2004)<br />
can simultaneously fit the Δχ-tensor and assign the signals of 15 N-HSQC spectra of samples<br />
containing diamagnetic and paramagnetic lanthanides, but assumes that the 15 N-HSQC peaks are<br />
sufficiently well resolved such that the paramagnetic peaks can be unambiguously associated with<br />
their diamagnetic partners. The program Echidna (Schmitz et al., 2006) uses assigned diamagnetic<br />
15 N-HSQC cross-peaks of a uniformly 15 N-labelled protein to determine the magnitude and Euler<br />
angles of the Δχ- tensor and, simultaneously, the assignment of the paramagnetic 15 N-HSQC cross-<br />
peaks. It also requires prior knowledge of the approximate metal ion position. In principle, the<br />
structure refinement packages Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) with the
78 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
module PARArestraint for Xplor-NIH (Banci et al., 2004), GROMACS (Van der Spoel et al.,<br />
2005) with an implementation of orientation restraints (Hess et al., 2003) or DYANA (Güntert et<br />
al., 1997) with the module PSEUDYANA (Banci et al., 1998) could be used for Δχ-tensor<br />
determination from PCS but the protocols would be cumbersome. Considering that simultaneous<br />
determination of the Δχ-tensor and metal ion position relative to a known protein structure is a<br />
commonly required task, we set out to design a tool to achieve this in an easier and user-friendly<br />
way.<br />
While the metal coordinates of metalloproteins can be accurately determined by<br />
crystallography, the metal position must be fitted when no crystal structure is available, e.g., when<br />
the lanthanide is introduced via a lanthanide tag. None of the reported tools addresses this issue.<br />
Here we present the newly developed program Numbat (New User-friendly Method Built for<br />
Automatic Δχ-Tensor determination), which can simultaneously fit the Δχ-tensor and lanthanide<br />
coordinates using experimental PCS values and the coordinates of the protein. Furthermore, the<br />
program encompasses a number of useful tools for multiple data sets recorded with different<br />
paramagnetic lanthanides, for rigid-body docking using PCS, and for analysis and visualization of<br />
the results. Following a description of the algorithm on which the program builds and a<br />
presentation of the graphical user interface, we illustrate the use of Numbat for building the model<br />
of a complex in a rigid-body docking approach using PCS.<br />
3.5 Algorithm<br />
The Δχ-tensor can be determined and refined by the comparison between experimentally<br />
determined PCS values and PCS values back-calculated from the atomic coordinates of the<br />
molecular structure (Sherry et al., 1977, Lee et al., 1983, Emerson et al., 1990, Veitch et al., 1990,<br />
Banci et al., 1992, Capozzi et al., 1993). The pseudocontact shift of a nuclear spin i, PCSi calc , is<br />
given by (Bertini et al., 2002):<br />
(3.1)<br />
where i, i, i are the Cartesian coordinates of the nuclear spin i in the Δχ-tensor frame, ri is<br />
the distance between the spin i and the paramagnetic centre, and Δχax and Δχrh are the axial and
3.5 Algorithm. 79<br />
rhombic components of the Δχ-tensor. The orientation of the Δχ-tensor frame with respect to the<br />
protein frame can be specified, e.g., by three Euler angles α, β and γ 5 .<br />
To quantify the difference between experimental and back-calculated PCS values we define<br />
a quadratic cost c:<br />
(3.2)<br />
where PCSi exp is the experimental PCS for the spin i, and toli is its associated tolerance. The<br />
tolerance values can be used to reflect different uncertainties in the measurement of different PCS.<br />
When the lanthanide position is known, only five Δχ-tensor parameters have to be optimized. In<br />
this case, the least square fitting problem is linear, as can be seen from an alternate formulation of<br />
the PCS (Bertini et al., 2002):<br />
(3.3)<br />
where xi, yi, zi are the Cartesian coordinates of the spin i in an arbitrary frame f and Δχxx,<br />
Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in this frame. The Singular Value<br />
Decomposition (SVD) algorithm, which is commonly used to determine an alignment tensor from a<br />
set of experimental RDC (Valafar et al., 2004, Wei et al., 2006), would be a good candidate to<br />
minimize the cost c. The least square fitting, or the Simplex algorithm (Nelder et al., 1965) has<br />
been applied in previous work (Emerson et al., 1990, Capozzi et al., 1993). However the most<br />
general problem one has to solve is non-linear since the metal ion position may be unknown. We<br />
consequently chose for the non-linear least square fitting procedure in Numbat the Levenberg-<br />
Marquardt algorithm (Marquardt, 1963) as implemented in the GNU Scientific Library (Galassi et<br />
al., 2006).<br />
5 The parameters that are fitted by the software Numbat are: i, i, i, Δχax, Δχrh, α, β, and γ.
80 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
3.6 Program Features<br />
3.6.1 GUI<br />
The graphical user interface (GUI) of Numbat was built with the GTK+ library (Krause,<br />
2007) that is part of standard installations of recent Linux systems. Figure 3.1 shows two<br />
screenshots of the main interface of Numbat illustrating the intuitive and flexible user interface.<br />
Figure 3.1 Screenshots of Numbat main windows. (a) Graphical User Interface for the<br />
selection Structure and Data. Four PCS data sets can be loaded simultaneously under<br />
the tabs PCS1 to PCS4. The list of all atoms is displayed in the main frame and can be<br />
filtered with the Display tab to show only the atom or residue types of interest. The<br />
experimental PCS and the tolerance can be directly modified, and only atoms that are<br />
selected (see the column labelled “use?”) are taken into account in the calculations.<br />
The distance between the respective atom to the metal ion, the calculated PCS and the<br />
deviation between experimental and predicted PCS are calculated and displayed after
3.6 Program Features. 81<br />
each fitting procedure. (b) Graphical User Interface for Tensor Calculation. A Δχ-<br />
tensor can be fitted for each of the data sets PCS1 to PCS4. An additional tab<br />
(Multiple PCS) is for fitting different data sets that share the same metal-ion centre.<br />
The frame PDB selector allows the choice of the model(s) to be used from a family of<br />
conformers loaded. The Tensor search restraints frame allows the individual selection<br />
of each of the eight variables to be free, fixed or constrained between two values. The<br />
computed Δχ-tensor values are displayed with error estimates from the GSL<br />
implementation of the Levenberg-Marquardt algorithm and the corresponding unique<br />
tensor representation (“UTR”) is reported.<br />
3.6.2 Input files<br />
Numbat reads atomic coordinates from protein data bank (PDB; (Berman et al., 2000)) files.<br />
In the case of <strong>NMR</strong> structures, the entire ensemble of conformers is loaded and any subset can be<br />
selected for subsequent calculations. When optimizing the Δχ-tensor, PCS are back-calculated for<br />
each selected structure and averaged for the computation of the cost function c (equation (3.2)).<br />
PCS data can be read either in the Xplor-NIH format or in a format specific to Numbat. For test<br />
purposes, Numbat also allows the generation of PCS data (optionally with addition of Gaussian<br />
noise) for a user-specified Δχ-tensor.<br />
3.6.3 Methyl group definition<br />
The 1 H chemical shift of a rotating methyl group can be described as the average of the<br />
chemical shifts of the three 1 H spins. The selection ―methyl association‖ in the GUI allows<br />
definition of pseudoatom names for any methyl group for which the experimental PCS value is to<br />
be treated as the average of the PCS of the three 1 H nuclei. The pseudoatom names can be used to<br />
identify the experimental PCS values of methyl groups in the input file. Alternatively, the PCS<br />
values of methyl groups can be interactively entered via the user-interface.<br />
3.6.4 Optimization of the tensor parameters<br />
In order to give the user a maximum of flexibility, any subset of the eight Δχ-tensor<br />
variables can be optimized with the remaining ones fixed to user-specified values. Such a situation<br />
occurs, for example, when a protein-ligand complex is studied where the protein is tagged with a<br />
lanthanide. First, the Δχ-tensor can be determined using the PCS measured for the protein. Fitting<br />
of the position and orientation of the Δχ-tensor with respect to the ligand can subsequently be
82 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
performed with a minimal number of adjustable parameters by keeping the axial and rhombic<br />
components of the Δχ-tensor fixed at the values determined for the protein. The Δχ-tensors<br />
determined for the protein and the ligand can finally be superimposed to derive a model of the<br />
protein-ligand complex (Pintacuda et al., 2007).<br />
Numbat also offers the option of restricting the Δχ-tensor variables within user-defined<br />
boundaries. This is useful if the magnitude, position and/or orientation of the Δχ-tensor is<br />
approximately known from previous studies (Su et al., 2008). Depending on the quality and<br />
quantity of PCS measurements available, the Δχ-tensor variables (especially the lanthanide<br />
coordinates) may only reach a local minimum during the optimization procedure. Therefore the<br />
starting values of all Δχ-tensor variables used to initialize the minimiser can be changed<br />
interactively within Numbat.<br />
3.6.5 Residual Anisotropic Chemical Shifts (RACS)<br />
Paramagnetic lanthanides bound to the protein weakly align the molecule in the magnetic<br />
field resulting in an incomplete averaging of the anisotropic chemical shifts. This can affect the<br />
PCS by a shift of up to 0.2 ppm for backbone 15 N and 13 C’ spins at a magnetic field of 18.8 T (John<br />
et al., 2005). The RACS correction term Δδ RACS for 1 H N , backbone 15 N and 13 C’ spins can be<br />
calculated given the Δχ-tensor and the chemical shielding anisotropic tensor (CSA-tensor) using<br />
(John et al., 2005):<br />
(3.4)<br />
where B0 is the magnetic field, μ0 the induction constant, k the Boltzmann constant, T the<br />
temperature, ζii CSA the principal components of the CSA-tensor, cos θij the nine direction cosines<br />
between pairs of the principal axis of the Δχ-tensor and the CSA-tensor, and Δχjj the principal<br />
components of the Δχ-tensor. Numbat optionally uses the RACS correction term when generating<br />
PCS data and fitting Δχ-tensors. The orientations of the principal component axes of the nuclear<br />
CSA-tensors and the ζii CSA values for 1 H N , backone 15 N and 13 C’ are taken from (Cornilescu et al.,<br />
2000).<br />
3.6.6 Multiple PCS data sets
3.6 Program Features. 83<br />
A new PCS data set can be obtained by replacing one paramagnetic lanthanide with another<br />
paramagnetic lanthanide. Multiple PCS data sets obtained in this way share a conserved lanthanide<br />
position, but different orientations and magnitudes of the Δχ-tensors must be fitted to each<br />
individual PCS data set. Numbat can perform a simultaneous fit of the Δχ-tensors and the shared<br />
lanthanide position. This feature is of particular interest when only a limited number of PCS can be<br />
measured for each lanthanide ion, as fewer variables in the Δχ-tensor fit will facilitate the<br />
determination of accurate Δχ-tensor parameters. For example, a limited set of unambiguously<br />
measured PCS can be used to determine initial Δχ-tensor parameters from which the PCS of<br />
unassigned paramagnetic cross-peaks can be back-calculated, leading to assignments of additional<br />
paramagnetic cross-peaks and improved Δχ-tensor parameters. Similarly, applications to small<br />
ligand molecules with a small number of <strong>NMR</strong> signals are aided by limiting the number of<br />
adjustable variables to a minimum.<br />
3.6.7 PCS modification<br />
Once an initial Δχ-tensor has been fitted, Numbat computes and displays PCS values for all<br />
atoms. Doubtful assignments can easily be detected at this stage by inspection of the deviation<br />
between experimental and calculated values. Numbat allows interactive modification of PCSi exp and<br />
toli as well as the input of additional PCS data.<br />
3.6.8 PCS selection<br />
The experimental PCS values to be used for the Δχ-tensor fit can be selected according to<br />
three criteria: A list of (i) residue types or (ii) atom types can be provided by the user. This is<br />
convenient in the case of selectively isotope-labelled proteins and allows a quick assessment of the<br />
amount of information necessary in order to retrieve a robust Δχ-tensor. (iii) Each individual PCS<br />
can be selected or deselected interactively via the GUI interface. This is particularly convenient if,<br />
after initial optimisation of the Δχ-tensor, some of the back-calculated PCS consistently show large<br />
deviations with respect to the experimental values, which may be due to erroneous assignments or<br />
discrepancies between the atomic coordinates of the PDB file and the actual structure of the<br />
protein, as is often the case for flexible polypeptide segments. Deselecting the corresponding atoms<br />
is likely to improve the Δχ-tensor fit in the next iteration.<br />
3.6.9 Conventions
84 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
Different conventions have been used in the literature to report Δχ-tensor parameters,<br />
including different definitions of Euler angles, choice of principal and secondary axis of the Δχ-<br />
tensor, and units of Δχ-tensor magnitudes. Numbat can report the Δχ-tensor parameters in many<br />
different conventions but uses as a default the following conventions: (i) The axes of the Δχ-tensor<br />
frame are labelled such that |Δχzz| ≥ |Δχyy| ≥ |Δχxx| in analogy to alignment tensor conventions<br />
(Clore et al., 1998). This ensures that axial and rhombic components are always of the same sign.<br />
(ii) The Euler angles α, β and γ are expressed in the ―ZYZ‖ convention, i.e., the first rotation of<br />
angle α is around the z axis of the protein frame, the second rotation of angle β is around the new y’<br />
axis and the last rotation of angle γ is around the new z’’ axis (Figure 3.2). While for an<br />
asymmetric object the Euler angles are uniquely defined if the angles α, β and γ are taken in the<br />
intervals [0, 2π[, [0, π[, [0, 2π[, respectively, ambiguities arise for symmetric objects. Therefore, we<br />
chose the interval [0, π[ for all three angles, eliminating the potential ambiguities arising from the<br />
four symmetry-related Δχ-tensors that generate the same PCS values. In the case of β = 0, an<br />
infinite number of combinations of and would produce the same overall rotation. In this case,<br />
we set γ = 0. These two rules ensure that any Δχ-tensor is unambiguously reported as a single set of<br />
parameters which is referred to in the GUI as UTR (Unique Δχ-Tensor Representation).<br />
Figure 3.2 Euler angle definitions used by Numbat. The relative orientation of the Δχ-<br />
tensor frame with respect to the protein frame is defined by Euler rotations of angle α, β<br />
and γ in the ZYZ convention. (a) A right-handed rotation of angle α around the z axis is<br />
applied to the protein frame xyz to give the frame x’y’z’. (b) A second rotation of angle<br />
β around the new axis z’ is applied to the frame x’y’z’ to give x’’y’’z’’. (c) The last<br />
rotation of angle γ around the z’’ axis gives the Δχ-tensor frame.<br />
3.6.10 Error analysis<br />
The Levenberg-Marquardt algorithm is used to minimize the cost c (equation (3.2)), but the<br />
quality of the fit cannot be assessed without further error analysis. Therefore, in addition to the<br />
uncertainty values provided by the GSL implementation of the minimiser, Numbat embeds a Monte
3.6 Program Features. 85<br />
Carlo protocol with random Gaussian noise added either to the atomic coordinates of the molecule<br />
or to the experimental PCS values. The robustness of the Δχ-tensor fit with respect to the PCS data<br />
set can also be tested by random subset selection of the PCS values used. Resulting Δχ-tensor<br />
orientations are displayed in a Sanson-Flamsteed projection (Bugayevskiy et al., 1995) using the<br />
plotting utility gnuplot.<br />
3.6.11 Visualization<br />
Graphical visualization of the Δχ-tensor frame and isosurfaces of PCS values in the<br />
structure of the molecule presents a convenient way to assess the similarity of the principal axes of<br />
multiple Δχ-tensors and the similarity of their respective isosurfaces. To this end Numbat interfaces<br />
with the molecular viewers MOLMOL (Koradi et al., 1996) and PyMOL (DeLano, 2002) by<br />
generating suitable macro files and displaying the Δχ-tensor frame and corresponding PCS<br />
isosurfaces in superimposition with the protein studied, as illustrated in Figure 3.3. The files of the<br />
macros, PCS potential and PDB file containing the coordinates of the protein together with<br />
coordinates of the metal ion and Δχ-tensor axes can also be saved for later use.<br />
Figure 3.3 Visualisation of the Δχ-tensor in MOLMOL and PyMOL, and display of<br />
its orientational uncertainty in a Sanson-Flamsteed projection plot. Numbat can<br />
directly call MOLMOL and PyMOL to display the axes of the fitted Δχ-tensor and<br />
PCS isosurfaces at user-defined contour levels. The orientational uncertainty of the<br />
Δχ-tensor frame can be evaluated by a Monte-Carlo protocol with random additions<br />
of noise to the structure coordinates and/or PCS data, with optional random
86 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
selection of subsets of data. Numbat calls gnuplot to display the results in a Sanson-<br />
Flamsteed projection plot.<br />
3.6.12 Output<br />
The list of PCS can be saved in Xplor-NIH format and in a Numbat-specific format. The<br />
weak molecular alignment in the magnetic field resulting from a non-vanishing Δχ-tensor can be<br />
described by an alignment tensor with principal axes parallel to those of the Δχ-tensor and axial and<br />
rhombic components that are directly proportional to Δχax and Δχrh, respectively (Tolman et al.,<br />
1995). Numbat calculates the RDC between two spins A and B for the situation of a completely<br />
rigid molecule, using (Bertini et al., 2002)<br />
(3.5)<br />
where γA and γB are the magnetogyric ratios of spins A and B, respectively, ħ the Planck<br />
constant divided by 2π, S the order parameter, rAB the internuclear distance, and AB, AB, AB the<br />
coordinates of the vector AB expressed in the Δχ-tensor frame. The RDC values are reported in<br />
Xplor-NIH (Schwieters et al., 2003, Schwieters et al., 2006) and Pales (Zweckstetter et al., 2000)<br />
format.<br />
Finally, Numbat can generate PDB files where the Δχ-tensor is reported in a format ready<br />
for use with MOLMOL or PyMOL for rigid-body docking alignment, or for further refinement by<br />
Xplor-NIH.<br />
3.7 Study case<br />
The proteins ε and θ are subunits of the complex of proteins constituting E. coli DNA<br />
polymerase III. The complex between the N-terminal domain of ε (ε186) and θ has been<br />
extensively studied using PCS data (Pintacuda et al., 2006, Pintacuda et al., 2007). In light of the<br />
recent crystal structure of the complex between ε186 and the θ homolog HOT (Kirby et al., 2006),<br />
we illustrate in the following the features of Numbat by revisiting the <strong>NMR</strong> structure of the<br />
complex between ε186 and θ which was derived from PCS induced by Dy 3+ and Er 3+ ions bound to<br />
the natural metal-binding site of ε186 (Pintacuda et al., 2006).
3.7 Study case. 87<br />
The coordinates of the A chain in the PDB deposition 2IDO (Kirby et al., 2006) was used as<br />
the structural model for ε186. The structural model of θ was conformer 10 of the <strong>NMR</strong> structure of<br />
θ in complex with ε186 (PDB accession code 2AXD; (Keniry et al., 2006)). This conformer was<br />
chosen because it has the lowest backbone RMSD to the HOT protein (2.1 Å) for residues 9-66 (the<br />
structurally defined region for which meaningful PCS could be measured). The experimentally<br />
determined PCS values of ε186 have been reported previously (Schmitz et al., 2006) and the PCS<br />
values of θ are provided in the Supporting Information. All Δχ-tensor optimizations were performed<br />
using Numbat including the RACS correction term and a tolerance value toli of zero for all spins.<br />
3.7.1 Subunit ε186<br />
Table 3.1 presents the results of the Δχ-tensor fit to the PCS measured for ε186. Initially,<br />
individual eight-variable Δχ-tensor optimizations were performed using the PCS data of each<br />
lanthanide (Table 3.1, columns 1 and 2). Next, the Numbat GUI was updated to display the<br />
deviations between the experimental and back-calculated PCS for the Δχ-tensors found. Several<br />
atoms showed deviations > 0.15 ppm between the experimental and back-calculated PCS (15 out of<br />
199 and 8 out of 255 atoms in the case of Dy 3+ and Er 3+ , respectively. Without the RACS<br />
correction, deviations > 0.15 ppm where observed for 36 and 7 atoms, respectively). Assuming that<br />
these outliers were due to problematic measurements or inaccuracies of the 3D structure, these PCS<br />
were removed interactively using the GUI. Re-calculation of the Δχ-tensor was found not to change<br />
the fitted Δχ-tensor parameters significantly for any of the lanthanide ions (results not shown). This<br />
can be explained by the high quality and large number of experimental PCS data available for each<br />
lanthanide (backbone 13 C’, 15 N and 1 H N spins), resulting in robust fits of the Δχ-tensors.<br />
Table 3.1 Δχ-tensors determined by Numbat in the frames of the ε186 and θ molecule<br />
ε186 a θ b<br />
Individual c Combined d Individual c Combined d Fixed e<br />
Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+ Dy 3+ Er 3+<br />
Δχax f 42.3 -10.6 42.3 -10.7 40.1 -13.0 40.2 -10.0 42.3 -10.7<br />
Δχrh f 5.3 -5.1 5.3 -5.1 14.8 -6.5 14.9 -4.8 5.3 -5.1<br />
α g 169.5 144.2 169.5 143.9 27.7 23.7 27.7 19.9 42.2 34.9<br />
β g 30.2 29.1 30.2 29.2 114.6 108.8 113.9 118.2 119.2 121.5<br />
γ g 134.6 126.9 134.7 126.8 28.4 170.6 27.3 177.7 44.7 177.4<br />
mx h 29.4 29.3 29.4 29.4 6.2 9.5 6.4 6.4 4.3 4.3<br />
my h 31.9 32.0 31.9 31.9 -7.5 -7.2 -7.5 -7.5 -5.5 -5.5
88 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
mz h 26.7 26.7 26.7 26.7 -18.9 -19.0 -18.8 -18.8 -19.8 -19.8<br />
a Δχ-tensor parameters determined relative to chain A in the PDB coordinate set 2IDO<br />
b Δχ-tensor parameters determined relative to model 10 in the PDB data set 2AXD<br />
c Δχ-tensors determined from PCS induced by Dy 3+ or PCS induced by Er 3+ (individual<br />
optimization)<br />
d Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously and optimizing for<br />
a single metal ion position (combined optimization)<br />
e Δχ-tensors determined by using the PCS data of Dy 3+ and Er 3+ simultaneously, optimizing for a<br />
single metal ion position and fixing the Δχax and Δχrh at the values determined from the PCS data of<br />
ε186 (fixed optimization)<br />
f In units of 10 -32 m 3<br />
g Euler rotations in the ZYZ convention (degrees)<br />
h Metal ion coordinate (Å) relative to chain A in the PDB coordinate set 2IDO<br />
Since the coordinates of the Dy 3+ and Er 3+ found in the individual fits were very similar<br />
(Table 3.1, columns 1 and 2), we subsequently assumed that the Δχ-tensors induced by each<br />
lanthanide are centered at the same position relative to ε186. The results obtained by<br />
simultaneously fitting the distinct Δχ-tensors while restraining their metal coordinate to a common<br />
centre (Table 3.1, columns 3 and 4) show little difference to the Δχ-tensor parameters found when<br />
performing the individual optimizations.<br />
For comprehensive error analysis, we introduced a random error into the structure<br />
coordinates of ε186, where the atomic coordinates were varied according to a Gaussian distribution<br />
with a standard deviation ζ of 0.5 Å, resulting in a mean atom displacement of 0.8 Å. The resulting<br />
uncertainty in Δχ-tensor parameters was approximately equivalent to the uncertainty introduced by<br />
a random variation added to the measured PCS data sampled from a Gaussian distribution with a<br />
standard deviation ζ of 0.15 ppm. The Δχ-tensor parameters of ε186 were well defined, as the<br />
values of all eight Δχ-tensor variables determined by 1000 randomized pseudo-replicates of the<br />
structure were in good agreement with the Δχ-tensors fitted to the original structure (Table 3.2,<br />
column 1). To eliminate the possibility that the quality of the Δχ-tensor fit was significantly<br />
affected by the number of PCS measured, the error analysis for the Δχ-tensors fitted to ε186 was<br />
recalculated with random selection of only 20% of the measured PCS. The results (Table 3.2,<br />
column 2) show that the Δχ-tensor parameters of ε186 were still well defined. Figure 1.14.a
3.7 Study case. 89<br />
illustrates how well the Δχ-tensor axis are defined, even when randomly disregarding 50% of the<br />
data.<br />
Table 3.2 Error analysis for the Dy 3+ Δχ-tensors fitted to PCS of ε186 and θ a<br />
ε186 θ<br />
Structure variation Subset of PCS Structure variation Subset of PCS<br />
Δχax b 42.0 (0.8) 42.4 (1.1) 41.9 (4.3) 40.3 (3.1)<br />
Δχrh b 5.3 (0.5) 5.4 (0.8) 15.0 (4.5) 15.3 (2.8)<br />
α c 169.5 (0.7) 169.7 (0.9) 29.3 (6.1) 27.6 (3.2)<br />
β c 30.2 (0.3) 30.2 (0.5) 114.5 (4.3) 114.4 (3.3)<br />
γ c 134.0 (2.6) 134.7 (4.0) 29.2 (10.6) 28.9 (7.9)<br />
mx d 29.4 (0.1) 29.4 (0.2) 6.1 (1.3) 6.2 (0.9)<br />
my d 31.9 (0.1) 31.9 (0.2) -7.4 (1.0) -7.6 (0.7)<br />
mz d 26.7 (0.1) 26.7 (0.1) -19.1 (0.8) -18.9 (0.4)<br />
a The average values of the Δχ-tensors and their standard deviations (in brackets) are reported.<br />
Average values and standard deviations were calculated from 1000 sets of randomised atom<br />
coordinates (where the extent of randomisation followed a Gaussian distribution with a standard<br />
deviation ζ of 0.5 Å) or from randomly picked subsets of the PCS data (20% in the case of ε186 and<br />
80% in the case of θ where much fewer PCS were available)<br />
b In units of 10 -32 m 3<br />
c Euler rotations in the ZYZ convention (degrees)<br />
d Metal ion coordinate (Å) in the protein frame (A chain of the PDB coordinates 2IDO and model<br />
10 in the PDB data set 2AXD, respectively)<br />
3.7.2 Subunit θ<br />
The results of the Δχ-tensor determination in the molecular frame of θ are presented in<br />
Table 3.1. There was only a small number of spins for which the back-calculated PCS deviated<br />
from the experimental PCS by more than 0.15 ppm (4 out of 50 in the case of Dy 3+ , 0 out of 41 for
90 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
Er 3+ ). Like for ε186, removal of these PCS from the optimization did not significantly change the<br />
parameters of the fitted Δχ-tensors. While the Δχax and Δχrh values of Er 3+ determined from the PCS<br />
observed for θ and 186 were very similar, the Δχrh value of the Dy 3+ tensor found for θ was almost<br />
three times larger than that found for ε186 6 . We subsequently performed an error analysis for θ as<br />
for the ε186 subunit, introducing either random variations into the atomic positions of θ according<br />
to a Gaussian distribution with a standard deviation ζ of 0.5 Å or using a random selection of only<br />
80% of the measured PCS. In either case, the Δχ-tensor parameters of θ proved to be less well<br />
defined than those of ε186 (Table 3.2) and Figure 1.14.b. As θ samples a relatively small and<br />
remote volume of the Δχ-tensors due to its spatial separation from the metal ion, one would expect<br />
a less accurate determination of the Δχ-tensors from the θ data. The effect could be exacerbated by<br />
inaccuracies of the <strong>NMR</strong> structure.<br />
In order to compensate for the smaller number of experimentally determined PCS available<br />
for θ (only 1 H N spins) and the poorer quality of the Δχ-tensors fitted, we performed another fit with<br />
Δχax and Δχrh fixed to the values determined for ε186 (Table 3.1, columns 9 and 10). Analysis of<br />
the experimental versus back-calculated PCS, both for the eight- and six-variable fits of the Δχ-<br />
tensor to θ, showed that the PCS deviations were similar in magnitude and trends. Therefore,<br />
constraining Δχax and Δχrh did not significantly deteriorate the quality of the fit, despite considerable<br />
changes of the Δχ-tensor parameters (Table 3.1). The variability of the Δχ-tensor parameters over<br />
all the 12 deposited θ conformers in 2AXD using the fixed optimisation scheme is provided in the<br />
Supporting Information.<br />
3.7.3 Modelling the complex between ε186 and θ<br />
Numbat facilitates the modelling of protein-protein complexes by listing coordinates of the<br />
Δχ-tensor axes together with the protein coordinates in files in PDB format. Superimposition of the<br />
Δχ-tensors fitted to ε186 and θ for each lanthanide ion yields the three-dimensional structure of the<br />
ε186/θ complex by straightforward rigid-body docking. Standard PyMOL or MOLMOL commands<br />
6 The discrepancy of the Rhombic component would not necessarily affect the rigid body docking<br />
of the complex, as only the orientation of the Δχ-tensors and the coordinates of the paramagnetic<br />
center are used.
3.7 Study case. 91<br />
can be used to align the Δχ-tensors. Numbat reports the coordinate system of the Δχ-tensor in such<br />
a way that all four degenerate solutions arising from the symmetry of the Δχ-tensor about the x, y<br />
and z axes (Figure 3.4) can easily be visualised. Identification of the correct solution requires<br />
additional information, such as proper steric interactions, chemical shift perturbation data or<br />
knowledge of the biological function of the complex. The most objective way, however, is by<br />
simultaneous evaluation of the Δχ-tensors of different lanthanides (Pintacuda et al., 2006).<br />
In the case of the complex between ε186 and θ, the Δχ-tensor frames of Dy 3+ and Er 3+ share<br />
a common origin for both proteins. Seven coordinates are necessary to define two Δχ-tensor frames<br />
sharing the same origin. Because of the second Δχ-tensor, the degeneracy of Figure 3.4 is broken.<br />
There are exactly 16 possibilities to align two pairs of Δχ-tensor. The lowest RMSD value resulting<br />
from all 16 possible 7-coordinate alignments between the two combined Δχ-tensors identified a<br />
single relative orientation of the two proteins as the best solution. The position of θ relative to ε186<br />
derived from PCS data in this way was also the correct solution. It agreed with a model of the<br />
complex obtained by superimposition of θ onto HOT in the ε186/HOT complex, with a backbone<br />
RMSD of 4.4 Å. Similarly for the Δχ-tensor of θ calculated with fixed Δχax and Δχrh values, a<br />
backbone RMSD of 4.3 Å was calculated relative to HOT. When PCS data from only Dy 3+ or Er 3+<br />
were used, the backbone RMSD values were, respectively, 4.2 Å and 4.4 Å for the best fit to the<br />
ε186/HOT complex. The model of the ε186/θ complex derived from the fixed, Dy 3+ and Er 3+ data<br />
sets is displayed in Figure 3.5.
92 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
3.8 Conclusion<br />
Figure 3.4 The four degenerate solutions arising from the symmetry of the<br />
Δχ-tensor around the x, y and z axes. All four possibilities result in the same<br />
calculi of PCS, hence in the same isosurfaces.<br />
Figure 3.5 The complex between ε186 and θ determined by superimposition<br />
of Δχ-tensors. The ε186/HOT complex (PDB accession code 2IDO) is shown<br />
for reference, with ε186 coloured in silver and HOT (residues 9-66) in<br />
orange. The isosurfaces correspond to the PCS induced by the Dy 3+ ion<br />
(from individual optimisation) contoured at +/-1.5 ppm and +/-0.5 ppm.<br />
Blue and red isosurfaces represent regions with positive and negative PCS,<br />
respectively. Residues 9-66 of θ are shown as a thin ribbon in the position<br />
defined by the fixed Dy 3+ and Er 3+ data (brown).<br />
The program Numbat is the first software package for fitting Δχ-tensors from PCS data with<br />
a user-friendly graphical user interface (GUI). Numbat calculations are fast, as it was written with<br />
open-source Linux routines in C. While the main task of Numbat is the fit of the eight Δχ-tensor
3.9 Acknowledgment. 93<br />
variables, the intuitive GUI combined with convenient data handling, including Monte-Carlo error<br />
analysis and links to the molecular viewers MOLMOL and PyMOL, offer high flexibility of use.<br />
The study case of the complex formed between the subunits ε186 and θ of E. coli DNA polymerase<br />
III illustrates the simplicity of use of Numbat.<br />
The program is freely available under the GNU General Public License (GPL) upon request<br />
(see also http://compbio.chemistry.uq.edu.au/bmmg/christophe/numbat.html).<br />
3.9 Acknowledgment<br />
Financial support from the Australian <strong>Research</strong> Council for project grants to G.O and T.H.<br />
is gratefully acknowledged.<br />
3.10 References<br />
Allegrozzi M, Bertini I, Janik MBL, Lee YM, Lin GH and Luchinat C (2000) Lanthanide-induced<br />
pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40<br />
Å from the metal ion. J Am Chem Soc 122:4154-4161<br />
Banci L, Bertini I, Bren KL, Cremonini MA, Gray HB, Luchinat C and Turano P (1996) The use of<br />
pseudocontact shifts to refine solution structures of paramagnetic metalloproteins:<br />
Met80Ala cyano-cytochrome c as an example. J Biol Inorg Chem 1:117-126<br />
Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />
restraints for Xplor-NIH. J Biomol <strong>NMR</strong> 28:249-261<br />
Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />
PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />
torsion angle molecular dynamics. J Biomol <strong>NMR</strong> 12:553-557<br />
Banci L, Bertini I, Savellini GG, Romagnoli A, Turano P, Cremonini MA, Luchinat C and Gray<br />
HB (1997) Pseudocontact shifts as constraints for energy minimization and molecular<br />
dynamics calculations on solution structures of paramagnetic metalloproteins. Proteins<br />
29:68-76
94 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
Banci L, Dugad LB, La Mar GN, Keating KA, Luchinat C and Pierattelli R (1992) 1 H nuclear<br />
magnetic resonance investigation of cobalt(II) substituted carbonic anhydrase. Biophys J<br />
63:530-543<br />
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN and Bourne<br />
PE (2000) The protein data bank. Nucleic Acids Res 28:235-242<br />
Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />
Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />
domain reorientation in calmodulin. Proc Natl Acad Sci U S A 101:6841-6846<br />
Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />
Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />
Ca Ln calbindin D9k. J Biomol <strong>NMR</strong> 21:85-98<br />
Bertini I, Luchinat C and Parigi G (2002) Magnmagnetic suceptibility in paramgnetic nmr. Prog<br />
<strong>NMR</strong> Spectrosc 40:249-273<br />
Bugayevskiy LM and Snyder JP (1995). Map projections: A reference manual. Taylor & Francis,<br />
London.<br />
Capozzi F, Cremonini MA, Luchinat C and Sola M (1993) Assignment of pseudo-contact-shifted<br />
1 H <strong>NMR</strong> resonances in the EF site of Yb 3+ -substituted rabbit parvalbumin through a<br />
combination of 2D techniques and magnetic susceptibility tensor determination. Magn<br />
Reson Chem 31:S118-S127<br />
Clore GM, Gronenborn AM and Bax A (1998) A robust method for determining the magnitude of<br />
the fully asymmetric alignment tensor of oriented macromolecules in the absence of<br />
structural information. J Magn Reson 133:216-221<br />
Cornilescu G and Bax A (2000) Measurement of proton, nitrogen, and carbonyl chemical shielding<br />
anisotropies in a protein dissolved in a dilute liquid crystalline phase. J Am Chem Soc<br />
122:10143-10154<br />
DeLano WL (2002) The PyMOL molecular graphics system. Palo Alto, CA, USA.<br />
Dosset P, Hus JC, Blackledge M and Marion D (2000) Efficient analysis of macromolecular<br />
rotational diffusion from heteronuclear relaxation data. J Biomol <strong>NMR</strong> 16:23-28<br />
Eichmüller C and Skrynnikov NR (2007) Observation of μs time-scale protein dynamics in the<br />
presence of Ln 3+ ions: Application to the N-terminal domain of cardiac troponin C. J<br />
Biomol <strong>NMR</strong> 37:79-95<br />
Emerson SD and La Mar GN (1990) <strong>NMR</strong> determination of the orientation of the magnetic-<br />
susceptibility tensor in cyanometmyoglobin: A new probe of steric tilt of bound ligand.<br />
Biochemistry 29:1556-1566
3.10 References. 95<br />
Galassi M, Davies J, Theiler B, Gough G, Jungman M, Booth M and Rossi F (2006). GNU<br />
scientific library reference manual. Network Theory Ltd, Bristol.<br />
Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />
of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J<br />
Biomol <strong>NMR</strong> 28:205-212<br />
Güntert P, Mumenthaler C and Wüthrich K (1997) Torsion angle dynamics for <strong>NMR</strong> structure<br />
calculation with the new program DYANA. J Mol Biol 273:283-298<br />
Hess B and Scheek RM (2003) Orientation restraints in molecular dynamics simulations using time<br />
and ensemble averaging. J Magn Reson 164:19-27<br />
Jensen MR, Hansen DF, Ayna U, Dagil R, Hass MAS, Christensen HEM and Led JJ (2006) On the<br />
use of pseudocontact shifts in the structure determination of metalloproteins. Magn Reson<br />
Chem 44:294-301<br />
John M, Park AY, Pintacuda G, Dixon NE and Otting G (2005) Weak alignment of paramagnetic<br />
proteins warrants correction for residual CSA effects in measurements of pseudocontact<br />
shifts. J Am Chem Soc 127:17190-17191<br />
John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />
ligand complexes by transferred paramagnetic shifts. J Am Chem Soc 128:12910-12916<br />
Keniry MA, Park AY, Owen EA, Hamdan SM, Pintacuda G, Otting G and Dixon NE (2006)<br />
Structure of the θ subunit of Escherichia coli DNA polymerase III in complex with the ε<br />
subunit. J Bacteriol 188:4464-4473<br />
Kirby TW, Harvey S, DeRose EF, Chalov S, Chikova AK, Perrino FW, Schaaper RM, London RE<br />
and Pedersen LC (2006) Structure of the Escherichia coli DNA polymerase III ε-HOT<br />
proofreading complex. J Biol Chem 281:38466-38471<br />
Koradi R, Billeter M and Wüthrich K (1996) MOLMOL: A program for display and analysis of<br />
macromolecular structures. J Mol Graphics 14:51-55<br />
Krause A (2007). Foundations of GTK+ development. Apress, Berkeley, CA, USA.<br />
Lee L and Sykes BD (1983) Use of lanthanide-induced nuclear magnetic-resonance shifts for<br />
determination of protein structure in solution: EF calcium binding site of carp parvalbumin.<br />
Biochemistry 22:4366-4373<br />
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind<br />
Appl Math 11:431-441<br />
Nelder JA and Mead R (1965) A simplex method for function minimization. Comput J 7:308-313<br />
Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />
complexes by lanthanide labeling. Acc Chem Res 40:206-212
96 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
Pintacuda G, Keniry MA, Huber T, Park AY, Dixon NE and Otting G (2004) Fast structure-based<br />
assignment of 15 N HSQC spectra of selectively 15 N-labeled paramagnetic proteins. J Am<br />
Chem Soc 126:2963-2970<br />
Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />
<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J Am Chem<br />
Soc 128:3696-3702<br />
Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />
tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />
87<br />
Schwieters CD, Kuszewski JJ and Clore GM (2006) Using Xplor-NIH for <strong>NMR</strong> molecular<br />
structure determination. Prog <strong>NMR</strong> Spectrosc 48:47-62<br />
Schwieters CD, Kuszewski JJ, Tjandra N and Clore GM (2003) The Xplor-NIH <strong>NMR</strong> molecular<br />
structure determination package. J Magn Reson 160:65-73<br />
Sherry AD and Pascual E (1977) Proton and carbon lanthanide-induced shifts in aqueous alanine.<br />
Evidence for structural changes along lanthanide series. J Am Chem Soc 99:5871-5876<br />
Su XC, McAndrew K, Huber T and Otting G (2008) Lanthanide-binding peptides for <strong>NMR</strong><br />
measurements of residual dipolar couplings and paramagnetic effects from multiple angles.<br />
J Am Chem Soc 130:1681-1687<br />
Tolman JR, Flanagan JM, Kennedy MA and Prestegard JH (1995) Nuclear magnetic dipole<br />
interactions in field-oriented proteins: Information for structure determination in solution.<br />
Proc Natl Acad Sci U S A 92:9279-9283<br />
Valafar H and Prestegard JH (2004) REDCAT: A residual dipolar coupling analysis tool. J Magn<br />
Reson 167:228-241<br />
Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE and Berendsen HJC (2005)<br />
GROMACS: Fast, flexible, and free. J Comput Chem 26:1701-1718<br />
Veitch NC, Whitford D and Williams RJP (1990) An analysis of pseudocontact shifts and their<br />
relationship to structural features of the redox states of cytochrome b5. FEBS Lett 269:297-<br />
304<br />
Wang X, Srisailam S, Yee AA, Lemak A, Arrowsmith C, Prestegard JH and Tian F (2007)<br />
Domain-domain motions in proteins from time-modulated pseudocontact shifts. J Biomol<br />
<strong>NMR</strong> 39:53-61<br />
Wei Y and Werner MH (2006) iDC: A comprehensive toolkit for the analysis of residual dipolar<br />
couplings for macromolecular structure determination. J Biomol <strong>NMR</strong> 35:17-25
3.11 Supporting information. 97<br />
Zweckstetter M and Bax A (2000) Prediction of sterically induced alignment in a dilute liquid<br />
crystalline phase: Aid to protein structure determination by <strong>NMR</strong>. J Am Chem Soc<br />
122:3791-3792<br />
3.11 Supporting information<br />
Table S3.1 Experimentally determined 1 H N PCS for θ in complex with ε186 at pH 7.0 and 25°C a<br />
Residue PCS Dy 3+ (ppm) PCS Er 3+ (ppm)<br />
ASP 9 -1.28 0.31<br />
GLN 10 -1.19 0.3<br />
THR 11 -1.11 0.25<br />
GLU 12 -1.24 0.27<br />
MET 13 -1.84 0.38<br />
ASP 14 -2 0.32<br />
LYS 15 -1.5<br />
VAL 16 -1.97 0.13<br />
VAL 18 -1.91 0.14<br />
ASP 19 -1.32 -0.03<br />
LEU 20 -1.29 -0.03<br />
ALA 21 -1.37 -0.14<br />
ALA 22 -0.57 -0.18<br />
ALA 23 0.15 -0.38<br />
GLY 24 0.37 -0.46<br />
VAL 25 0.29 -0.26<br />
ALA 26 0.54 -0.3<br />
PHE 27 1.21<br />
LYS 28 0.85 -0.31<br />
GLU 29 0.68 -0.19<br />
ARG 30 0.81 -0.23<br />
ASN 32 0.72
98 Chapter 3. Numbat: new user-friendly method built for automatic Δχ-tensor determination.<br />
MET 33 0.6<br />
VAL 35 0.02<br />
ILE 36 -0.29 0.08<br />
ALA 37 -0.12 0<br />
GLU 38 -0.22 0.04<br />
ALA 39 -0.4<br />
VAL 40 -0.55<br />
GLU 41 -0.51 -0.06<br />
ARG 42 -0.56 0.09<br />
GLU 43 -0.84 0.12<br />
GLU 46 -0.66 0.11<br />
LEU 48 -0.71 -0.01<br />
ARG 49 -0.58 0.08<br />
SER 50 -0.45 0.05<br />
TRP 51 -0.5<br />
PHE 52 -0.58<br />
ARG 53 -0.38<br />
GLU 54 -0.26 -0.01<br />
ARG 55 -0.23 -0.07<br />
LEU 56 -0.15 -0.09<br />
ILE 57 0.02 -0.09<br />
ALA 58 0.14 -0.11<br />
HIS 59 0.3 -0.18<br />
ARG 60 0.39 -0.17<br />
LEU 61 0.42 -0.15<br />
SER 63 0.8 -0.27<br />
VAL 64 0.71 -0.19<br />
ASN 65 0.8 -0.21<br />
LEU 66 -0.26<br />
a Experimental conditions as described in Pintacuda et al. (2006) J. Am. Chem. Soc. 128, 3696-<br />
3702
3.11 Supporting information. 99<br />
Table S3.2 Comparison of θ Δχ-tensor parameters when using only conformer 10 a or all<br />
conformers b of the <strong>NMR</strong> structure of .<br />
Fixed a Fixed(Family) b<br />
Dy 3+ Er 3+ Dy 3+ Er 3+<br />
Δχax c 42.3 -10.7 42.3 -10.7<br />
Δχrh c 5.3 -5.1 5.3 -5.1<br />
α d 42.2 34.9 40.5 34.5<br />
β d 119.2 121.5 118.9 121.2<br />
γ d 44.7 177.4 38.1 174.8<br />
mx e 4.3 4.3 4.7 4.7<br />
my e -5.5 -5.5 -5.8 -5.8<br />
mz e -19.8 -19.8 -19.7 -19.7<br />
a The Δχ-tensor determined using the fixed optimisation scheme relative to θ conformer 10<br />
b The Δχ-tensor determined using the fixed optimisation scheme relative to simultaneously all 12<br />
deposited θ conformers<br />
c In units of 10 -32 m 3<br />
d Euler rotations in the ZYZ convention (degrees)<br />
e Metal ion coordinate (Å) in the protein frame (PDB data set 2AXD)
Chapter 4<br />
Protein Structure<br />
Determination from<br />
Pseudocontact Shifts using<br />
ROSETTA<br />
4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA<br />
Christophe Schmitz a , Robert Vernon b , Gottfried Otting c , David Baker b and Thomas Huber a<br />
a School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072,<br />
Australia<br />
b Department of Biochemistry, University of Washington, University of Washington, Seattle, WA<br />
98195<br />
c <strong>Research</strong> School of Chemistry, Australian National University, Canberra, ACT 0200, Australia<br />
Manuscript submitted to the Proceedings of the National Academy of Sciences of the United States<br />
of America.
102 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
4.1 Abstract<br />
Pseudocontact shifts (PCS) arise from paramagnetic metal ions bound to proteins and are<br />
manifested as large changes in chemical shifts detected in nuclear magnetic resonance (<strong>NMR</strong>)<br />
spectra. PCS data constitute long-range restraints on the positions of nuclear spins relative to the<br />
coordinate system of the magnetic susceptibility anisotropy tensor ( -tensor) of the metal ion.<br />
Protein structure determination using PCS data only, however, is hampered by the difficulty to<br />
determine the -tensor and metal position without knowledge of the protein structure. We have<br />
circumvented this problem in the program PCS-ROSETTA by using the structure prediction<br />
program ROSETTA to generate the models required for fitting of the -tensor parameters. PCS<br />
restraints implemented in the fragment assembly step of PCS-ROSETTA proved highly efficient in<br />
biasing the sampling of the conformational space towards the correct target structure. The results<br />
show that using a combination of chemical shift and PCS data, ROSETTA can determine structures<br />
accurately for proteins of up to 150 residues. Lanthanides can be incorporated into proteins quite<br />
generally through metal binding tags, and the combination of these data with the PCS-ROSETTA<br />
method provides a powerful new approach to protein structure determination.<br />
4.2 Introduction<br />
The three-dimensional (3D) structure of proteins is a prerequisite for understanding protein<br />
function, protein-ligand interactions and rational drug design. Protein structures can be readily<br />
determined by <strong>NMR</strong> spectroscopy. The most difficult part of an <strong>NMR</strong> structure determination<br />
typically is the assignment of sidechain chemical shifts and NOESY peaks. This bottleneck can<br />
potentially be avoided if methods for computing high accuracy structures from backbone-only<br />
<strong>NMR</strong> experiments can be developed.<br />
PCSs are a potentially rich source of structural information that are manifested as large<br />
changes in chemical shifts in the <strong>NMR</strong> spectrum caused by a non-vanishing magnetic susceptibility<br />
anisotropy tensor ( -tensor) of a paramagnetic metal ion. The PCS (in ppm) of a nuclear spin i<br />
depends on the polar coordinates ri, i, and i of the nuclear spin with respect to the -tensor<br />
frame of the metal ion and the axial and rhombic components of the -tensor:
4.2 Introduction. 103<br />
(4.1)<br />
The -tensor defines a coordinate system in the molecule that is centered on the metal ion<br />
and is fully described by eight parameters ( ax, rh, three Euler angles relating the orientation of<br />
the Δχ-tensor to the protein frame, and the coordinates of the metal ion). Therefore, the Δχ-tensor<br />
can be determined using PCS data from at least eight nuclear spins, provided the coordinates of the<br />
spins are known.<br />
As PCSs can be measured for nuclear spins 40 Å away from the metal, they present long-<br />
range structure restraints exquisitely suited to characterize the global structural arrangement of a<br />
protein. PCSs have thus been used very successfully to refine protein structures (Bertini et al.,<br />
2001, Gaponenko et al., 2004, Arnesano et al., 2005), dock protein molecules of known 3D<br />
structures (Ubbink et al., 1998, Pintacuda et al., 2006) and determine the structure of small<br />
molecules bound to a protein of known 3D structure (John et al., 2006, Pintacuda et al., 2007,<br />
Zhuang et al., 2008). The need for atom coordinates to determine the Δχ-tensor parameters,<br />
however, makes it more difficult to use PCSs in de novo determinations of protein 3D structures.<br />
All presently available protein structure determination software that uses PCS data to supplement<br />
conventional <strong>NMR</strong> restraints requires estimates of ax and rh as input parameters (Banci et al.,<br />
1998, Banci et al., 2004). These are often difficult to estimate accurately, as they depend on the<br />
chemical environment of the metal ion and the mobility of the paramagnetic center with respect to<br />
the protein.<br />
The ROSETTA structure prediction methodology (Simons et al., 1997) is well suited for<br />
taking advantages of the rich source of information inherent in PCSs. ROSETTA de novo structure<br />
prediction has two stages —first a low resolution phase in which conformational space is searched<br />
broadly using a coarse grained energy function, and second, a high resolution phase in which<br />
models generated in the first phase are refined in a physically realistic all atom force field. The<br />
bottleneck in structure prediction using ROSETTA is conformational sampling; close to native<br />
structures almost always have lower energies than non native structures. For small proteins ( < 100<br />
residues), ROSETTA has produced models with atomic level accuracy in blind prediction<br />
challenges (Raman et al., 2009). For larger proteins, however, structures close enough to the native<br />
structure to fall into the deep native energy minimum are generated seldom or not at all. This<br />
sampling problem can be overcome if even very limited experimental data is available to guide the<br />
initial low resolution search. For example, CS-ROSETTA uses <strong>NMR</strong> chemical shifts to guide
104 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
fragment selection and constrain backbone torsion angles, greatly improving the final yield of<br />
correctly folded protein models (Shen et al., 2008). As ROSETTA in favorable cases is capable of<br />
generating protein structures very close to experimentally determined structures from sequence<br />
information alone (Bradley et al., 2005), it is of great interest to combine ROSETTA with readily<br />
accessible experimental data to determine protein structures.<br />
In this paper we describe the incorporation of PCS data into ROSETTA. We show that this<br />
new PCS-ROSETTA method can generate accurate structures for proteins of up to 150 amino acids<br />
in length even from quite limited data sets.<br />
4.3 Results<br />
4.3.1 Test set<br />
We tested the new PCS-ROSETTA method (see Methods) on a benchmark of nine proteins<br />
for which chemical shifts and PCSs have been published. ArgN repressor was determined twice<br />
with PCS data measured from paramagnetic metal ions at two different sites. The proteins were<br />
between 56 to 186 amino acid residues in size, had different folds and had between 82 and 1169<br />
PCSs measured from one to eleven different metal ions located at a single metal binding site [Table<br />
4.1 and supporting information (SI) Table S4.1]. Fragments for each protein were selected with CS-<br />
ROSETTA using available chemical shift data and were used for all calculations. Structures of<br />
proteins with significant sequence similarity to the target proteins were explicitly excluded from the<br />
CS-ROSETTA database.<br />
Table 4.1 Protein structures used to evaluate the performance of PCS-ROSETTA<br />
Targets PDB ID Nres a NM b c PCS-ROSETTA<br />
Npcs<br />
run d<br />
CS-ROSETTA<br />
run e<br />
rmsd f conv g Q h rmsd f conv g<br />
protein G (A) 3GB1 56 3 158 0.61 0.92 0.06 0.80 0.88 (Wilton et al., 2008)<br />
Refcs i Refpcs j<br />
calbindin (B) 1KQV 75 11 1169 1.46 2.04 0.16 4.96 4.37 (Balayssac et al., 2006)<br />
θ subunit (C) 2AE9 76 2 91 1.65 4.35 0.07 8.90 8.75 (Mueller et al., 2005)<br />
(Saio et al.,<br />
2009)<br />
(Bertini et al.,<br />
2001)<br />
(Schmitz et al.,<br />
2008)<br />
ArgN k (D) 1AOY 78 3 222 0.98 2.38 0.08 6.93 5.32 (Su et al., 2008) (Su et al., 2008)<br />
ArgN l (E) 1AOY 78 2 82 1.03 2.25 0.09 8.01 6.64 (Su et al., 2008) (Su et al., 2009a)<br />
N-calmodulin<br />
(F)<br />
1SW8 79 2 125 2.34 1.85 0.09 4.69 3.68 (Bertini et al., 2004)<br />
(Bertini et al.,<br />
2004)
thioredoxin (G) 1XOA 108 1 90 2.58 2.64 0.23<br />
parvalbumin<br />
(H)<br />
4.98 6.06<br />
(Lemaster et al., 1988,<br />
Chandrasekhar et<br />
al., 1991)<br />
1RJV 110 1 106 11.26 10.42 0.20 11.80 11.20 (Baig et al., 2004)<br />
calmodulin (I) 2K61 146 4 408 2.80 2.12 0.14 6.35 5.55 (Bertini et al., 2009)<br />
ε186 m (J) 1J54 186 3 738 20.57 17.54 0.36 15.46 17.23 (DeRose et al., 2002)<br />
a Number of residues.<br />
b Number of metal ions for which PCS data were measured.<br />
c Total number of PCSs measured.<br />
4.3 Results. 105<br />
(Jensen et al.,<br />
2006)<br />
(Baig et al.,<br />
2004)<br />
(Bertini et al.,<br />
2009)<br />
(Schmitz et al.,<br />
2006)<br />
d The structures used to calculate the rmsds were identified using the combined PCS-score and<br />
ROSETTA full atom energy on the whole protein sequence.<br />
e The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy on<br />
the whole protein sequence.<br />
f C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />
values were calculated using the core residues defined in SI Table S4.2.<br />
g Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />
structure, in Å. The rmsd values were calculated on the whole protein sequence.<br />
h Quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ) calculated on the structure of lowest<br />
PCS-ROSETTA score.<br />
i Reference for the experimental chemical shifts.<br />
j Reference for the experimental PCSs.<br />
k PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />
repressor (ArgN).<br />
l PCSs measured with a non-covalent tag bound to ArgN.<br />
m N-terminal 186 residues of the subunit of the E. coli polymerase III.<br />
4.3.2 Capacity of the PCS Score to Identify Native-like Structures<br />
The PCS score describes a model’s agreement with observed PCS data by calculating the<br />
expected PCS data given the structure. To calculate this, a three dimensional grid search is used to<br />
determine the metal coordinates and Δχ-tensor components necessary for producing an optimal<br />
match between calculated and observed data (see Materials and Methods). The capacity of the PCS<br />
score to identify native like models was tested on sets of 3000 CS-ROSETTA structures for each of<br />
the nine test proteins. These test structures were produced using a reduced fragment set and
106 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
included native fragments to ensure that some of the models were similar to the target structure.<br />
The C rmsd of the decoy with the lowest PCS score was always small (below 2.3 Å) with respect<br />
to the target protein (Figure 4.1). In addition, for all target proteins for which PCSs were available<br />
from two or more paramagnetic metal ion, low C rmsd values correlated with low PCS scores.<br />
This indicates that the PCS score can be used not only to identify near-native structures, but also to<br />
bias conformational sampling towards the native structure during the fragment assembly.<br />
Comparisons between the ROSETTA low resolution energy function and PCS score are shown in<br />
SI Figure S4.1.<br />
Figure 4.1 Fold identification by pseudocontact shifts. 3000 decoys were generated using CS-<br />
ROSETTA. In order to ensure the presence of decoys with low rmsd values to the target structure,<br />
the starting set of peptide fragments was reduced and included fragments from the known target<br />
structures. PCS scores are plotted versus the C rmsd to the target structure. The targets are<br />
labeled A-J as in Table 4.1. The PCS score correlates strongly with the C rmsd.<br />
PCSs from eleven different lanthanides were available for calbindin. In order to explore the<br />
value of using PCSs from multiple lanthanides, we rescored the structures using PCSs from both<br />
individual and multiple lanthanides. Linear regressions of PCS score versus rmsd had slopes<br />
ranging from 0.03 to 5.17 (average 2.26) for single data sets. Pairwise combination of PCS sets<br />
resulted in increased regression slopes ranging from 0.15 to 7.42 (average 3.84). Using all PCS sets<br />
resulted in a slope greater than 11, showing that PCSs from multiple metal ions greatly facilitate<br />
identification of native-like protein folds.<br />
4.3.3 Comparison of PCS-ROSETTA with CS-ROSETTA
4.3 Results. 107<br />
10000 decoys each were generated with CS-ROSETTA and PCS-ROSETTA. Both<br />
computations used the same fragment set, taking into account secondary structure information from<br />
chemical shift measurements. Figure 4.2 illustrates the ability of the PCS score to bias sampling<br />
towards the native structure. For seven out of the ten structure calculations, the PCSs dramatically<br />
increased the frequency with which decoys with low C rmsd to the reference structure were found.<br />
The effect was particularly pronounced for protein targets with larger PCS data sets. For example,<br />
more than a third of the decoys found for calmodulin had a C rmsd of less than 4 Å to the target<br />
structure, whereas fewer than 3% met this criterion in the absence of PCS data. Similar results were<br />
obtained for the θ subunit, protein G, and both ArgN repressor calculations. The PCS data did not<br />
significantly improve the results for thioredoxin and parvalbumin for which only PCS data from a<br />
single paramagnetic metal ion were available. No native-like structures were found for 186 which<br />
may be attributed to its larger size (186 residues). To evaluate the influence of the PCS score during<br />
the fragment assembly, we performed an additional calculation with the PCS score as the only<br />
energy term (SI Text S4.1).<br />
The low resolution models were subjected to full atom relaxation refinement in the last step<br />
of the calculation, using the full atom ROSETTA force field (without inclusion of the PCS score).<br />
The additional minimization step did not significantly change the overall shape of the distributions,<br />
but tended to improve the C rmsd of native-like decoys (SI Figure S4.2) and, most importantly,<br />
allows recognition of the best models based on their energies.<br />
Rescoring full atom relaxed structures with a weighted combination of the ROSETTA and<br />
PCS scores further improved the recognition of near-native structures as measured by the C rmsd<br />
of the lowest energy structure [Table 4.1-f, PCS-ROSETTA run; Figure 4.3], with PCS-ROSETTA<br />
identifying low C rmsd (< 3 Å) structures in eight out of ten cases. With the exception of target C,<br />
for all successful targets a population of the five lowest energy structures converge to less than 3 Å<br />
, while the two failed targets do not improve beyond 10 Å [Table 4.1-g]. Convergence is a signal<br />
that the protocol has found a topology that reliably satisfies the combined score, which in the case<br />
of PCS-ROSETTA clearly identifies the failed models as unreliable, allowing for their rejection<br />
(Shen et al., 2008). In the case of target C large disordered termini prevent a clear identification of<br />
convergence, but convergence becomes apparent when only the core residues are considered (Table<br />
S4.2-g). Results with CS-ROSETTA and PCS-ROSETTA are compared in SI Figure S4.3.
108 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Figure 4.2 Improved conformational sampling by PCS-ROSETTA. 10000 independent low<br />
resolution trajectories were carried out with (black) or without (red) PCS information. The plots<br />
show the density of C rmsd values to the target structure after the fragment assembly step. The<br />
targets are labeled as in Table 4.1. Corresponding plots of structures calculated with full atom<br />
relaxation for positioning the amino acid side chains are shown in SI Figure S4.2. The library used<br />
for fragment selection explicitly excluded any protein with sequence similarity to the target protein.<br />
The figure shows that PCS scores efficiently guide fragment assembly towards the correct target<br />
structure.<br />
Figure 4.3 Energy landscapes generated by PCS-ROSETTA. Combined ROSETTA energy and PCS<br />
score (using the weighting factor w(c)) are plotted versus the C rmsd to the target structure for<br />
structures calculated using PCS-ROSETTA. The lowest energy structures are indicated in red. The<br />
targets are labeled as in Table 4.1. The results show that PCS-ROSETTA is likely to generate and<br />
identify the correct fold.
4.3 Results. 109<br />
Agreement of the structures with the experimental data can also be directly assessed by the<br />
quality factor Q = rms(PCSi cal – PCSi exp ) / rms(PCSi exp ), where PCSi exp is the experimental PCS<br />
value for the nuclear spin i 7 . A quality factor above 25% indicates failure to find a correct structure<br />
and a quality factor below 20% indicates that the computed structure is in good agreement with the<br />
experimental PCSs (Table 4.1), as in other definitions of quality factors (Cornilescu et al., 1998).<br />
The low quality factor of the θ subunit (7%) establishes the success of the calculation despite the<br />
lack of clear convergence.<br />
4.3.4 Successes and Limits of PCS-ROSETTA Calculations<br />
The results of PCS-ROSETTA calculations are summarized in Table 4.1. The structures of<br />
small proteins (< 80 residues, targets A to F) are easily solved by PCS-ROSETTA: the lowest PCS-<br />
ROSETTA energy are consistently below 2.4 Å in C rmsd relative to the native structure and have<br />
quality factor below 16%. For these proteins, the generation of 10000 models was ample (Figure<br />
4.2 A to F). The same number of decoys calculated with CS-ROSETTA did not lead to satisfactory<br />
convergence for targets B-C-D-E-F (Table 4.1-g), though targets C and D partially recover if<br />
flexible termini are removed at the full atom rescoring step (SI Text S4.2). The tag used to<br />
paramagnetically label ArgN (D) produced Δχ-tensor axes of significantly different orientation with<br />
different lanthanides (Su et al., 2008) which may explain why the PCS-ROSETTA calculations<br />
performed particular well with these data.<br />
PCS-ROSETTA succeeded in calculating the structure of a protein with 146 residues and<br />
PCSs from multiple lanthanides (target I). More than 62% of calculated structures had a C RMSD<br />
below 5 Å, while only 6.2% met that criterion for CS-ROSETTA calculation (Figure 4.2 I). This<br />
indicates that the PCS data score will effectively guide the sampling towards the correct fold also<br />
for larger proteins. While calculations on target J (186 residues) did not converge despite a large<br />
PCS data set, this can be attributed to a sampling problem associated with large proteins of<br />
complex topology (Bradley et al., 2005) which may be overcome with a modified protocol.<br />
Importantly, the success of a calculation can be ascertained from calculating the quality factor Q.<br />
Combined with the convergence criterion (Shen et al., 2008), the quality factor is an effective way<br />
7 Rms stands for Root Mean Square. Not to be confused with Rmsd (Root Mean Square Deviation).
110 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
to assert the success of a calculation (SI Figure S4.4). For each of the eight targets for which the<br />
PCS-ROSETTA calculations converged, the structure with the lowest energy is shown<br />
superimposed with the native structure in Figure 4.4.<br />
Figure 4.4 Superimpositions of ribbon representations of the backbones of the lowest energy<br />
structures calculated with PCS-ROSETTA (blue) onto the corresponding target structures (red).<br />
The protein targets are (A) protein G, (B) calbindin, (C) the θ subunit of E. coli DNA polymerase<br />
III, (D) the N-terminal domain of the E. coli arginine repressor (ArgN; with covalent lanthanide<br />
tag), (E) ArgN with non-covalent lanthanide tag, (F) the N-terminal domain of calmodulin, (G)<br />
thioredoxin, (H) parvalbumin, (I) calmodulin and (J) the globular domain of the ε subunit of E. coli<br />
DNA polymerase III. Flexible termini were omitted as described in SI Table S4.1. Only the target<br />
structure is shown for parvalbumin (H) and the ε subunit (J), as the calculations could not<br />
reproduce the correct fold for these proteins.<br />
4.4 Discussion<br />
The structural information content of the PCS effect has long been recognized, but initial<br />
attempts to determine the 3D structures of biomolecules by the use of PCSs were hampered by the<br />
difficulty to determine -tensor and structure simultaneously (Barry et al., 1971). Subsequently,<br />
the first 3D structure determinations of proteins relied on nuclear Overhauser effect data (Wüthrich,<br />
1986). Full structure determination of proteins from PCS data alone continues to be regarded as<br />
difficult (Bertini et al., 2002a). Owing to its modeling capabilities, PCS-ROSETTA makes it
4.4 Discussion. 111<br />
possible, for the first time, to determine 3D structures using PCSs as the only restraints while<br />
simultaneously determining all Δχ-tensor parameters and integrating PCSs from different metal<br />
ions. In addition, a PCS quality factor can be calculated that is highly indicative of the correctness<br />
of the final structure. The effect of the PCSs on improving convergence of the calculations towards<br />
the correct target structures is particularly remarkable if one considers that PCS data mostly were<br />
available only for backbone amides.<br />
The success of PCS-ROSETTA is based on the fact that, in contrast to scoring functions<br />
using chemical shift data, the PCS score is much more sensitive to global than local structure.<br />
Therefore, PCS data can guide the search in the low resolution fragment assembly step, greatly<br />
increasing the yield of near-native structures compared to CS-ROSETTA. PCSs thus present an<br />
ideal complement to chemical shift information that is most important in the preceding fragment<br />
selection step. The improved convergence alleviates the need to compute large numbers of decoys.<br />
It would be possible to accelerate the computations further by using the PCS score to select decoys<br />
with low rmsd values to the target structure prior to the computationally expensive refinement of<br />
amino acid side chain conformations.<br />
Many protein specific factors, including fold complexity, number and quality of PCS data,<br />
and metal site play roles in the success of PCS-ROSETTA fragment assembly and their relative<br />
importance is difficult to disentangle. In general, PCS data from two or more lanthanides are<br />
expected to assist identification of decoys with low rmsd to the target structure. While the structure<br />
of calmodulin, a protein with 146 residues, was successfully determined by PCS-ROSETTA, the<br />
structure of ε186 (186 residues) was not found by the program despite the availability of many<br />
PCSs overall (Table 4.1). The scarcity of PCS values for residues near the lanthanide binding site<br />
may have contributed to this effect. As the PCS-ROSETTA protocol did not sample structures<br />
below 10 Å rmsd (Figure 4.3 J) and the PCS scores of native-ε186 like structures only show a<br />
funnel-like energy landscape below 10 Å rmsd (Figure 4.1 J), this could also be a case where<br />
structures explored by the basic ROSETTA sampling protocol do not form enough native features<br />
for the PCS score to discriminate between them. An alternative sampling protocol, such as broken<br />
chain sampling (Bradley et al., 2006) or iterative refinement (Qian et al., 2007), may be the key to<br />
accurately modeling the structure of ε186 using PCS data.<br />
The present calculations were performed with proteins containing single metal binding sites.<br />
Clearly, data from multiple metal ions using different metal binding sites will greatly enhance the<br />
information content of PCS data. In particular, lanthanide ions display very different paramagnetic<br />
properties while their chemical similarity allows all lanthanides to bind at a given lanthanide
112 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
binding site. Several metal binding tags have recently been developed to tag proteins site-<br />
specifically with a paramagnetic lanthanide; for a recent review, see (Su et al., 2009b). We note that<br />
PCSs were as useful for targets devoid of natural metal binding sites (targets A, C, D and E) as for<br />
metalloproteins (Figure 4.2).<br />
We propose a new approach to protein structure determination in which PCS data are<br />
collected from natural or engineered metal binding sites, and then used to guide ROSETTA<br />
conformational search along with backbone chemical shift data. The accuracy and reliability of the<br />
lowest energy models is assessed based on the convergence of the calculation and the PCS quality<br />
factor. With multiple independent lanthanide datasets and improved conformational search<br />
methods, the approach should be extendable to proteins greater than 150 amino acids.<br />
4.5 Materials and Methods<br />
4.5.1 PCS-ROSETTA Score.<br />
et al., 2002b)<br />
The PCS (in ppm) induced by a metal ion M on a nuclear spin can be calculated as (Bertini<br />
(4.2)<br />
where ri is the distance between the spin i and the paramagnetic centre M, xi, yi, zi are the<br />
Cartesian coordinates of the vector between the metal ion and the spin i in an arbitrary frame f and<br />
Δχxx, Δχyy, Δχzz, Δχxy, Δχxz, Δχyz are the Δχ-tensor components in the frame f (as Δχzz = -Δχxx -Δχyy,<br />
there are only five independent parameters). The Δχ-tensor components and the metal coordinates<br />
are initially unknown and must be redetermined each time the PCS score c is evaluated. c is<br />
calculated over all metal ions Mj as<br />
(4.3)<br />
where PCSi calc (Mj) and PCSi exp (Mj) are the calculated and experimental PCS values of spin<br />
i induced by the metal ion Mj, respectively. The determination of the Δχ-tensor components and the
4.5 Materials and Methods. 113<br />
metal coordinates presents a non-linear least square fitting problem. In order to avoid local minima<br />
and speed up the calculation, we split the problem into its linear and non-linear part. Equation (4.2)<br />
shows that PCSi calc is linear with respect to the five Δχ-tensor components. Using a three-<br />
dimensional grid search over the Cartesian coordinates xM, yM, zM of the paramagnetic centre,<br />
singular value decomposition optimizes the five Δχ-tensor parameters efficiently and without<br />
ambiguity for lowest residual score c at each node of the grid. The grid node with the lowest c score<br />
is then used as the starting point for optimization of the three metal coordinates along with the five<br />
Δχ-tensor components to reach the minimal cost c.<br />
The PCS score was added to the ROSETTA low resolution energy function using a different<br />
weighting factor w(c) for each structure calculation. w(c) was determined by first generating 1000<br />
decoys with ROSETTA and calculating w(c) as<br />
(4.4)<br />
where ahigh and alow are the average of the highest and lowest 10% of the values of the<br />
ROSETTA ab initio score, and chigh and clow are the average of the highest and lowest 10% of the<br />
values of the PCS score c upon rescoring each of the 1000 decoys with the PCS. The weights used<br />
for the ten structure calculations performed in the present work are given in SI Table S4.1.<br />
4.5.2 PCS-ROSETTA Algorithm<br />
PCS-ROSETTA uses the ROSETTA de novo structure prediction methodology to build low<br />
resolution models, followed by all atom refinement using the ROSETTA high resolution Monte<br />
Carlo minimization protocol. The additions to the standard ROSETTA structure prediction methods<br />
are: the use of chemical shifts to guide fragment selection as in CS-ROSETTA, the use of PCS data<br />
to guide the initial low resolution search and the use of PCS data for final model selection. A flow<br />
diagram of the computational protocol of PCS-ROSETTA is shown in SI Figure S4.5.<br />
4.5.3 Input for PCS-ROSETTA<br />
The chemical shifts of all protein targets were taken from the literature or from the<br />
BioMagResBank. CS-ROSETTA was used for fragment selection. CS-ROSETTA reports the<br />
difference between experimental and expected chemical shifts. Chemical shifts with very large<br />
deviations from expectations (often attributable to errors in the deposited data) were removed from<br />
the input. CS-ROSETTA also suggests corrections in the chemical shift referencing. We only
114 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
corrected 13 C chemical shifts, except for thioredoxin where 15 N chemical shift were corrected (SI<br />
Table S4.1). CS-ROSETTA aims to generate 200 9-residue fragments and 200 3-residue fragments<br />
centered on each residue of the polypeptide chain for use in the ab initio fragment assembly<br />
protocol of ROSETTA. In cases where CS-ROSETTA failed to generate 200 fragments, we<br />
generated additional fragments using the conventional ROSETTA protocol in order to make 200<br />
fragments available. For each of the target proteins, we removed any protein with recognizable<br />
sequence similarity (BLAST E-value below 0.05) from the CS-ROSETTA protein database. (For<br />
example, the structure of ε186 is present in the CS-ROSETTA database, but was explicitly<br />
excluded when fragments were generated.) In order to accelerate the grid search for the metal<br />
position, PCS-ROSETTA allows a precise description of the space to be searched, including the<br />
center of the grid search (cg), the step size between two nodes (sg), an outer cutoff radius (co) to<br />
limit the search to a minimal distance from cg, and an inner cutoff radius (ci) to avoid a search too<br />
close to cg. A moderately large step size (sg) was chosen to speed up computations during low<br />
resolution sampling (Table S4.1), and reduced to 25% of its value during the final high resolution<br />
scoring step to ensure maximum accuracy. For each target, the grid parameters cg, co and ci were<br />
chosen in accordance to prior knowledge about the approximate metal binding site. For example,<br />
for a covalent tag attached to the protein, we used the known geometric information of the tag to set<br />
cg, co, and ci, whereas for proteins with a natural metal binding site, a highly conserved negatively<br />
charged residue was picked as a reference point for cg. In the absence of prior biochemical<br />
information, the nuclear spin with the largest absolute PCS value was chosen as the center of the<br />
grid. SI Table S4.1 summarizes the grid parameters used for the different protein targets. In order to<br />
assess the impact of the initial grid parameters on the structures calculated, a set of PCS-ROSETTA<br />
calculations was performed for each target, where cg was centered at the nuclear spin of the largest<br />
PCS observed and the cutoff radius co was set to 15 Å. No change in the quality of the results was<br />
observed but in most cases the calculations took longer.<br />
4.5.4 PCS-ROSETTA Protocol for Protein Structure Determination<br />
Chemical shifts of the proteins were prepared in Talos format (Cornilescu et al., 1999) and<br />
used by CS-ROSETTA for fragment selection. Chemical shift corrections, fragment selection, and<br />
determination of the weights w(c) were performed as described above. 10000 protein structures<br />
were computed with PCS-ROSETTA and subjected to the full atom relaxation protocol of<br />
ROSETTA to model the side chain conformations. The final structures were rescored using the<br />
ROSETTA full atom energy function combined with the PCS scores c, using the weighting factors<br />
w(c) (Equation (4.4)) with ahigh and alow calculated against the ROSETTA full atom energy, and
4.6 Acknowledgments. 115<br />
with a total weight multiplied by 2 to give a larger contribution to the PCS score than in the<br />
fragment assembly. The best scoring structures can be assessed by the PCS quality factor Q =<br />
rms(PCS cal – PCS exp ) / rms(PCS exp ). Computation of 10000 PCS-ROSETTA structures took on<br />
average 137 CPU days per target and was run on a local cluster. SI Figure S4.6 shows a posteriori<br />
that 1000 structures per targets would have been enough for convergence of the protocol.<br />
4.5.5 Computation of Structures to Evaluate the Effects of PCS Scoring<br />
3000 decoys with a wide range of rmsd values to the target structure were generated by<br />
including the native fragment and limiting the number of alternatives fragments in the fragment<br />
generation step of the ROSETTA calculations. 1000 decoys each were calculated using two, five<br />
and ten fragments per residue, respectively. The presence of the native fragments in a small pool of<br />
fragments ensured the generation of structures very similar to the target structure.<br />
4.6 Acknowledgments<br />
C.S. thanks the University of Queensland for a Graduate School <strong>Research</strong> Travel Grant to<br />
undertake this collaborative research project. T.H. thanks the Australian <strong>Research</strong> Council for a<br />
Future Fellowship. Financial support from the Australian <strong>Research</strong> Council for project grants to<br />
G.O. and T.H. is gratefully acknowledged. D.B. thanks the Howard Hughes Medical Institutes.<br />
4.7 References<br />
Arnesano F, Banci L and Piccioli M (2005) <strong>NMR</strong> structures of paramagnetic metalloproteins. Q.<br />
Rev. Biophys. 38:167-219<br />
Baig I, Bertini I, Del Bianco C, Gupta YK, Lee YM, Luchinat C and Quattrone A (2004)<br />
Paramagnetism-based refinement strategy for the solution structure of human α-<br />
parvalbumin. Biochemistry 43:5562-5573<br />
Balayssac S, Jiménez B and Piccioli M (2006) Assignment strategy for fast relaxing signals:<br />
complete aminoacid identification in thulium substituted Calbindin D9K. J. Biomol. <strong>NMR</strong><br />
34:63-73
116 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Banci L, Bertini I, Cavallaro G, Giachetti A, Luchinat C and Parigi G (2004) Paramagnetism-based<br />
restraints for Xplor-NIH. J. Biomol. <strong>NMR</strong> 28:249-261<br />
Banci L, Bertini I, Cremonini MA, Savellini GG, Luchinat C, Wüthrich K and Güntert P (1998)<br />
PSEUDYANA for <strong>NMR</strong> structure calculation of paramagnetic metalloproteins using<br />
torsion angle molecular dynamics. J. Biomol. <strong>NMR</strong> 12:553-557<br />
Barry CD, North ACT, Glasel JA, Williams RJP and Xavier AV (1971) Quantitative determination<br />
of mononucleotide conformations in solution using lanthanide ion shift and broadening<br />
<strong>NMR</strong> probes. Nature 232:236-245<br />
Bertini I, Del Bianco C, Gelis I, Katsaros N, Luchinat C, Parigi G, Peana M, Provenzani A and<br />
Zoroddu MA (2004) Experimentally exploring the conformational space sampled by<br />
domain reorientation in calmodulin. Proc. Natl. Acad. Sci. USA. 101:6841-6846<br />
Bertini I, Donaire A, Jiménez B, Luchinat C, Parigi G, Piccioli M and Poggi L (2001)<br />
Paramagnetism-based versus classical constraints: An analysis of the solution structure of<br />
Ca Ln calbindin D9k. J. Biomol. <strong>NMR</strong> 21:85-98<br />
Bertini I, Kursula P, Luchinat C, Parigi G, Vahokoski J, Wilmanns M and Yuan J (2009) Accurate<br />
solution structures of proteins from X-ray data and a minimal set of <strong>NMR</strong> data:<br />
Calmodulin-peptide complexes as examples. J. Am. Chem. Soc. 131:5134-5144<br />
Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002a) Efficiency of paramagnetism-<br />
based constraints to determine the spatial arrangement of α-helical secondary structure<br />
elements. J. Biomol. <strong>NMR</strong> 22:123-136<br />
Bertini I, Luchinat C and Parigi G (2002b) Magnetic susceptibility in paramagnetic <strong>NMR</strong>. Prog.<br />
<strong>NMR</strong> Spectr. 40:249-273<br />
Bradley P and Baker D (2006) Improved beta-protein structure prediction by multilevel<br />
optimization of NonLocal strand pairings and local backbone conformation. Proteins<br />
65:922-929<br />
Bradley P, Misura KMS and Baker D (2005) Toward high-resolution de novo structure prediction<br />
for small proteins. Science 309:1868-1871<br />
Chandrasekhar K, Krause G, Holmgren A and Dyson HJ (1991) Assignment of the 15 N <strong>NMR</strong><br />
spectra of reduced and oxidized Escherichia Coli thioredoxin. FEBS Lett. 284:178-183<br />
Cornilescu G, Delaglio F and Bax A (1999) Protein backbone angle restraints from searching a<br />
database for chemical shift and sequence homology. J. Biomol. <strong>NMR</strong> 13:289-302<br />
Cornilescu G, Marquardt JL, Ottiger M and Bax A (1998) Validation of protein structure from<br />
anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc.<br />
120:6836-6837
4.7 References. 117<br />
DeRose EF, Li DW, Darden T, Harvey S, Perrino FW, Schaaper RM and London RE (2002) Model<br />
for the catalytic domain of the proofreading epsilon subunit of Escherichia coli DNA<br />
polymerase III based on <strong>NMR</strong> structural data. Biochemistry 41:94-110<br />
Gaponenko V, Sarma SP, Altieri AS, Horita DA, Li J and Byrd RA (2004) Improving the accuracy<br />
of <strong>NMR</strong> structures of large proteins using pseudocontact shifts as long-range restraints. J.<br />
Biomol. <strong>NMR</strong> 28:205-212<br />
Jensen MR and Led JJ (2006) Metal-protein interactions: Structure information from Ni 2+ -induced<br />
pseudocontact shifts in a native nonmetalloprotein. Biochemistry 45:8782-8787<br />
John M, Pintacuda G, Park AY, Dixon NE and Otting G (2006) Structure determination of protein-<br />
ligand complexes by transferred paramagnetic shifts. J. Am. Chem. Soc. 128:12910-12916<br />
Lemaster DM and Richards FM (1988) <strong>NMR</strong> sequential assignment of Escherichia Coli<br />
thioredoxin utilizing random fractional deuteriation. Biochemistry 27:142-150<br />
Mueller GA, Kirby TW, DeRose EF, Li D, Schaaper RM and London RE (2005) Nuclear magnetic<br />
resonance solution structure of the Escherichia coli DNA polymerase III θ subunit. J.<br />
Bacteriol. 187:7081-7089<br />
Pintacuda G, John M, Su XC and Otting G (2007) <strong>NMR</strong> structure determination of protein-ligand<br />
complexes by lanthanide labeling. Acc. Chem. Res. 40:206-212<br />
Pintacuda G, Park AY, Keniry MA, Dixon NE and Otting G (2006) Lanthanide labeling offers fast<br />
<strong>NMR</strong> approach to 3D structure determinations of protein-protein complexes. J. Am. Chem.<br />
Soc. 128:3696-3702<br />
Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ and Baker D (2007) High-resolution<br />
structure prediction and the crystallographic phase problem. Nature 450:259-264<br />
Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange<br />
O, Kinch L, Sheffler W, Kim BH, Das R, Grishin NV and Baker D (2009) Structure<br />
prediction for CASP8 with all-atom refinement using Rosetta. Proteins online:<br />
Saio T, Ogura K, Yokochi M, Kobashigawa Y and Inagaki F (2009) Two-point anchoring of a<br />
lanthanide-binding peptide to a target protein enhances the paramagnetic anisotropic effect.<br />
J. Biomol. <strong>NMR</strong> 44:157-166<br />
Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />
tensor determination and NH assignment of paramagnetic proteins. J. Biomol. <strong>NMR</strong> 35:79-<br />
87<br />
Schmitz C, Stanton-Cook MJ, Su XC, Otting G and Huber T (2008) Numbat: an interactive<br />
software tool for fitting Δχ-tensors to molecular coordinates using pseudocontact shifts. J.<br />
Biomol. <strong>NMR</strong> 41:179-189
118 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu GH, Eletsky A, Wu Y, Singarapu KK,<br />
Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D and Bax<br />
A (2008) Consistent blind protein structure generation from <strong>NMR</strong> chemical shift data. Proc.<br />
Natl. Acad. Sci. USA. 105:4685-4690<br />
Simons KT, Kooperberg C, Huang E and Baker D (1997) Assembly of protein tertiary structures<br />
from fragments with similar local sequences using simulated annealing and bayesian<br />
scoring functions. J. Mol. Biol. 268:209-225<br />
Su XC, Liang H, Loscha KV and Otting G (2009a) [Ln(DPA)3] 3- Is a convenient paramagnetic shift<br />
reagent for protein <strong>NMR</strong> studies. J. Am. Chem. Soc. 131:10352-10353<br />
Su XC, Man B, Beeren S, Liang H, Simonsen S, Schmitz C, Huber T, Messerle BA and Otting G<br />
(2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic<br />
<strong>NMR</strong> spectroscopy. J. Am. Chem. Soc. 130:10486-10487<br />
Su XC and Otting G (2009b) Paramagnetic labelling of proteins and oligonucleotides. J. Biomol.<br />
<strong>NMR</strong> in press:<br />
Ubbink M, Ejdebäck M, Karlsson BG and Bendall DS (1998) The structure of the complex of<br />
plastocyanin and cytochrome f, determined by paramagnetic <strong>NMR</strong> and restrained rigid-<br />
body molecular dynamics. Structure 6:323-335<br />
Wilton DJ, Tunnicliffe RB, Kamatari YO, Akasaka K and Williamson MP (2008) Pressure-induced<br />
changes in the solution structure of the GB1 domain of protein G. Proteins 71:1432-1440<br />
Wüthrich K (1986). <strong>NMR</strong> of proteins and nucleic acids. Wiley, New York.<br />
Zhuang T, Lee HS, Imperiali B and Prestegard JH (2008) Structure determination of a Galectin-3-<br />
carbohydrate complex using paramagnetism-based <strong>NMR</strong> constraints. Protein Sci. 17:1220-<br />
1231<br />
4.8 Supporting information
4.8 Supporting information. 119<br />
Figure S4.1 Fold identification by pseudocontact shift score and ROSETTA energy. 3000 decoys<br />
were generated using CS-ROSETTA. In order to ensure that some decoys with small rmsd to the<br />
target structure were obtained, the starting set of peptide fragments was reduced and included the<br />
fragments from the known target structures. A to J: ROSETTA energies plotted versus the C rmsd<br />
to the target structure. A’ to J’: PCS scores plotted versus the C rmsd to the target structure. The<br />
targets are labeled A-J as in Table 4.1.
120 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Figure S4.2 Improved fragment assembly by PCS-ROSETTA. Fragments were assembled in 10000<br />
different runs of CS-ROSETTA (red), 10000 different runs of PCS-ROSETTA (black), and 10000
4.8 Supporting information. 121<br />
different runs using exclusively the PCS score of PCS-ROSETTA (blue). The plots show the<br />
frequency with which structures of different C rmsd values to the target structure were found. The<br />
red and black solid lines reproduce the data of Figure 4.2. The dashed lines show the<br />
corresponding data obtained in independent calculations that included the full atom refinement<br />
step. The same colors were used for calculations with and without the full atom refinement step.<br />
The full atom refinement step does not significantly change the C rmsd of the structures produced<br />
in the fragment assembly step with respect to the target structure. The targets are labeled A-J as in<br />
Table 4.1.<br />
Figure S4.3 Energy landscape generated by CS-ROSETTA and PCS-ROSETTA, with full atom<br />
ROSETTA energies and C α rmsd values being calculated using only the core residues as defined in<br />
Table S4.1. A to J: full atom ROSETTA energies plotted versus the C α rmsd to the target structure<br />
for structures calculated using CS-ROSETTA. A’ to J’: Combined ROSETTA energy and PCS score<br />
plotted versus the C rmsd to the target structure for structures calculated using PCS-ROSETTA.<br />
The lowest energy structures are indicated in red. The targets are labeled A-J as in Table 4.1.
122 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Figure S4.4 Identification of successful calculations with PCS-ROSETTA. The quality factor Q<br />
reports on the agreement between the experimental and calculated PCS. A value below 20%<br />
usually indicates that the calculated structure satisfy the PCS restraint. Above 25%, the quality of<br />
the structure is poor. On the y axis are the average C α rmsd calculated between the lowest scored<br />
structure and the next four lowest scoring structures. Rmsd below 3 Å are indicative of the<br />
convergence of the protocol. Convergence criterion and quality factor can be combined to further<br />
ascertain the success of the calculations for the targets A-B-C-D-E-F-I-G., and reject targets H and<br />
J. The targets are labeled A-J as in Table 4.1. The values are those of Table 4.1-h for the x axis,<br />
and Table 4.1-g for the y axis.
4.8 Supporting information. 123<br />
Figure S4.5 Flow diagram of PCS-ROSETTA. (a) Fragments are selected by their chemical shifts<br />
using CS-ROSETTA. (b) The PCS weight is calculated using equation (4.4) on 1000 decoys<br />
generated with CS-ROSETTA. (c) Structures are produced by the classical fragment assembly of<br />
ROSETTA with addition of the PCS-score. (d) Side chains are added to the structures and<br />
subjected to a full atom minimization. (e) Resulting structures are rescored using a combination of<br />
the ROSETTA full atom energy score and the PCS score. (f) Best structures are selected by their<br />
lowest score.
124 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Figure S4.6 Expected C α rmsd of the lowest energy structure calculated with PCS-ROSETTA. A<br />
given number n of structures (x axis) was randomly chosen 5000 times from the total of 10 000<br />
generated structures and the averaged C α rmsd of the lowest energy (over the 5000 trials) is<br />
graphed. The curves show a posteriori that 1000 structures calculated for all the targets would<br />
have been ample to ensure convergence of PCS-ROSETTA calculations. The targets are labeled A-<br />
J as in Table 4.1. The curve for the target parvalbumin (H) and ε (J) are not shown.
Table S4.1 PCS data information and grid search parameters used.<br />
4.8 Supporting information. 125<br />
Protein name Residues a Metal ions used Atom types cs corr b w(c) cg c sg d co d ci d<br />
protein G (A) 1-56 Tb 3+ Tm 3+ Er 3+ H N Ce<br />
0.53 15.5 E19 CA 6 17 7<br />
calbindin (B) 2-75<br />
3+ Dy 3+ Er 3+<br />
Eu 3+ Ho 3+ Nd 3+<br />
Pr 3+ Sm 3+ Tb 3+<br />
Tm 3+ Yb 3+<br />
H N , N, C’ 2.72 48.9 D54 CA 6 8 4<br />
θ subunit (C) 10-64 Dy 3+ Er 3+ H N -0.16 7.1 D14 CA 6 25 15<br />
ArgN (D) 8-70 Tb 3+ Tm 3+ Yb 3+ H N , N 2.09 13.5 C68 CB 6 10 4<br />
ArgN (E) 8-70 Tb 3+ Tm 3+ H N 2.09 48.9 K12 CB 6 15 0<br />
N-calmodulin (F) 3-79 Tb 3+ Tm 3+ H N , CA, CB 0.00 4.7 D60 CA 6 8 4<br />
thioredoxin (G) 2-108 Ni 2+ H N 1.23 106.3 S1 N 3.8 4 0<br />
parvalbumin (H) 2-109 Dy 3+ H N , N 2.65 2.86 D93 CA 6 8 4<br />
calmodulin (I) 3-146<br />
Tb 3+ Tm 3+ Yb 3+<br />
Dy 3+ H N 0.59 5.1 D60 CA 6 8 4<br />
ε186 (J) 7-180 Tb 3+ Dy 3+ Er 3+ H N , N, C’ 0.53 8.2 D12 CA 6 8 4<br />
a Ordered residues<br />
b Uniform offset used for 13 C chemical shifts (in ppm) compared to published values. In the case of<br />
thioredoxin, the offset was applied to 15 N chemical shifts<br />
c Residue and atom name defining the center of the grid search to position the paramagnetic center.<br />
d In Ångstrom
126 Chapter 4. Protein Structure Determination from Pseudocontact Shifts using ROSETTA.<br />
Table S4.2 Protein structures used to evaluate the performance of PCS-ROSETTA.<br />
Targets PCS-ROSETTA run a CS-ROSETTA run b<br />
rmsd c convergence d rmsd c convergence d<br />
protein G (A) 0.61 0.92 0.80 0.88<br />
calbindin (B) 1.46 2.09 4.96 4.72<br />
θ subunit (C) 1.30 0.55 1.56 2.25<br />
ArgN e (D) 1.00 0.77 1.31 2.21<br />
ArgN f (E) 0.83 0.94 1.65 5.43<br />
N-calmodulin (F) 1.74 1.49 4.69 4.49<br />
thioredoxin (G) 2.58 2.44 4.61 5.55<br />
parvalbumin (H) 11.26 10.25 11.80 11.30<br />
calmodulin (I) 2.80 2.12 6.35 2.94<br />
ε186 g (J) 20.57 18.03 17.07 17.74<br />
a The structures used to calculate the rmsds were identified using the combined PCS-score and<br />
ROSETTA full atom energy across only the core residues defined in SI Table S4.1.<br />
b The structures used to calculate the rmsds were identified by the ROSETTA full-atom energy<br />
across only the core residues defined in SI Table S4.1.<br />
c C α rmsd (with respect to the native structure) of the structure of lowest score, in Å. All C rmsd<br />
values were calculated using the core residues defined in SI Table S4.1.<br />
d Average C α rmsd calculated between the lowest score structure and the next four lowest scoring<br />
structure, in Å. The rmsd values were calculated using the core residues defined in SI Table S4.1.<br />
e PCSs measured with a covalent tag attached to the N-terminal domain of the E. coli arginine<br />
repressor (ArgN).<br />
f PCSs measured with a non-covalent tag bound to ArgN.<br />
g N-terminal 186 residues of the ε subunit of the E. coli polymerase III.<br />
Text S4.1 Fragment Assembly Using PCSs Only.<br />
In order to gain a better understanding of the merit of PCS data, we generated 10000 decoys<br />
per protein with all ROSETTA force field components turned off except for the PCS score. In<br />
seven of the ten protein structure calculations, the PCS score alone produced decoys with a C rmsd<br />
of less than 2.5 Å to the target structure (Figure S4.2, solid blue line). Control calculations without
4.8 Supporting information. 127<br />
any scoring function produced not a single useful decoy. This highlights the power of PCS data to<br />
define the overall topology of a protein at the fragment assembly stage. The effect was particularly<br />
pronounced for the target proteins θ and ArgN (Figure S4.2 C and D).<br />
The second set of PCS data of ArgN (Table 4.1; structure E) yielded worse decoys in the<br />
PCS-only computations with PCS-ROSETTA than CS-ROSETTA. Remarkably, however, using<br />
the PCS score in combination with the ROSETTA force field yielded much better structures than<br />
when used separately (Figure S4.3 E). This shows that the PCS score adds information that is not<br />
captured by the ROSETTA energy score alone.<br />
Text S4.2 Scoring over Core Residues.<br />
Disordered residues can add noise to the ROSETTA energy, and this noise can prevent<br />
identification of low rmsd structures. Notably, three of the targets that succeeded under the PCS-<br />
ROSETTA protocol and failed under the CS-ROSETTA protocol have disordered termini<br />
accounting for ten or more residues each (Table S4.2-d: Targets C, D & E). In practice it is possible<br />
to experimentally determine the disordered character of a residue, so to compare the effect of<br />
disorder on the two protocols we produced an additional set of structures by removing disordered<br />
residues during the final rescoring step (cores defined in Table S4.1). When the core residues are<br />
perfectly defined, in this case by observation of the solved structures, the CS-ROSETTA protocol<br />
identifies low rmsd structures in four of the ten cases (including C, D & E), and shows convergence<br />
to a low rmsd structure in three of the ten cases [Table S4.2-c, d, CS-ROSETTA run]. In contrast,<br />
removing the disordered residues has little effect on PCS-ROSETTA’s rmsd values, suggesting that<br />
the combined PCS and ROSETTA score is less sensitive to disorder. The remaining targets had<br />
little disorder and removal of disordered terminal residues had little effect on the results.
Chapter 5<br />
Conclusion and perspectives<br />
5. Conclusion and perspectives
130 Chapter 5. Conclusion and perspectives.<br />
5.1 The use of PCS for structure determination<br />
Structure determination of proteins remains a major challenge of the post genomic era.<br />
Conventional techniques such as X-ray crystallography and <strong>NMR</strong> spectroscopy are slow. Hence,<br />
the gap between known protein sequences and known protein structures remains large. Alternative<br />
experimental methods are required to speed up the process. Those methods have to present an<br />
attractive compromise between the efforts required to measure the desired data, and the merit that<br />
data can bring in assisting de novo determination of proteins.<br />
In Chapter 4, it has been demonstrated that PCS data are a potential candidate. It has been<br />
shown that combining a molecular fragment approach with the PCS score leads to the correct<br />
folding of proteins smaller than 146 residues. A benchmark of ten data sets has been compiled for<br />
this work, and the PCS-ROSETTA approach showed success in eight out of the ten cases. The first<br />
case where PCS did not lead to the correct folding concerned a protein having only one data set of<br />
PCS. The importance to measure multiple data sets with different lanthanides has already been<br />
shown in Chapter 2 (to increase the quality of the resonance assignment) and Chapter 3 (to increase<br />
the quality of the fitted Δχ-tensor). It was not surprising that PCS-ROSETTA had difficulties when<br />
working with a single data set. The second case where the fold was incorrect was for the largest<br />
protein of the benchmark: the subunit ε, 186 residues. The size of the protein might be a limiting<br />
factor for the current approach. It is well known that for proteins larger than 150 residues, the size<br />
of the conformational space explodes in the molecular fragment replacement protocol of<br />
ROSETTA. The question whether PCS-ROSETTA is facing the same problem is legitimate. Some<br />
elements of responses to that question are presented in the following sections.<br />
5.1.1 Folding of proteins using only pseudocontact shifts<br />
In order to gain a better understanding of the merit of the PCS, structure calculations with<br />
PCS-ROSETTA have been made with all energy terms switched off, except the PCS score. The<br />
fragment assembly within ROSETTA is hence guided purely by the PCS. While turning off the<br />
normal force field of ROSETTA presents no practical interest, the theoretical results presented in<br />
the following explanations remain interesting.<br />
The protocol generated and identified (by the lowest score) attractive folding for four out of<br />
the ten structure calculations (Figure 5.1). The term ―attractive folding‖ has to be defined in that<br />
context. Obviously, without energy terms such as van der Waals terms or hydrogen bonding, the<br />
resulting structures can exhibit steric clashes (Figure 5.1 a, c, d), or poor β-sheet pairing (Figure 5.1
5.1 The use of PCS for structure determination. 131<br />
a and c). However, the C α rmsd against the native structure was found to be reasonably low (2.25 Å<br />
for protein G, 1.89 Å for θ, 2.39 Å for ArgN, 5.03 Å for calmodulin). This provides proof of<br />
principle that the PCS alone can direct the correct folding of a protein at the fragment assembly<br />
level. The conditions of success remain unclear and should be further analyzed. It could be a<br />
combination of the complexity or size of the protein, the number and quality of data sets, the<br />
relative orientation of each Δχ-tensor, and the location of the paramagnetic center.<br />
Figure 5.1 Capacity of the PCS score, as the only energy term, to fold the<br />
protein. The lowest PCS energy structure (blue) is superimposed onto the<br />
native structure (white). (a) protein G, (b) θ subunit, (c) ArgN, (d) calmodulin.<br />
Figure 5.2 The intersection of isosurfaces defines the position and orientation of peptide<br />
fragments in the protein structure. (a) Three PCS of the spin (black) measured with three
132 Chapter 5. Conclusion and perspectives.<br />
different lanthanides can be depicted as three isosurfaces (red, blue and yellow) where the<br />
spin must be located. In order to fulfill all three PCS data simultaneously, the spin must be<br />
located at the intersection of the three isosurfaces, helping to define the orientation and<br />
position of the fragment (purple) in the protein structure (white). (b) The same principle<br />
holds, if PCS from different lanthanide binding sites are available, in which case the<br />
intersection of the three isosurfaces is even better defined.<br />
A direct consequence of those results is the theoretical demonstration that it is possible to<br />
apply the PCS score in a ―divide and conquer‖ approach in order to overcome the sampling<br />
problem typically encountered with proteins larger than 120 residues. If a large protein was<br />
composed of the four proteins present in Figure 5.1, a protocol to obtain the folding of the large<br />
protein could be (i) to cut the protein in four pieces (Figure 5.1 a-b-c-d), (ii) to reassemble the<br />
pieces separately, and (iii) to reconstitute the whole proteins by superimposition of the four<br />
separately determined (sets of) Δχ-tensors. The proof of principle holds as no energy term (that<br />
would report on interactions between the four parts of the proteins) is used. A more efficient way<br />
however, could be to work with smaller overlapping fragments. The size of the fragments would<br />
need to be large enough to make it possible to optimize the Δχ-tensor (larger than 20 residues), but<br />
not too small to prevent running into the sampling problem (smaller than 80 residues).<br />
Clearly, the PCS-score would benefit favorably from additional energy terms, starting from<br />
the van der Waals term that would strongly penalize conformations with steric clashes, thus<br />
favoring the sampling of the correct fold. In particular, the symmetric shape of the isosurfaces<br />
implies the existence of symmetrical folds that could be discriminated with the help of the van der<br />
Waals term, as theoretically demonstrated in (Bertini et al., 2002).<br />
5.1.2 Uses of multiple lanthanide binding sites<br />
Erstwhile limited to metalloproteins, paramagnetic <strong>NMR</strong> is increasingly enjoying a wider<br />
playground since the arrival of lanthanide binding tags. These tags can be attached to the protein<br />
via a disulfide bond or at the termini of the protein. A non-covalent tag can also be used, although<br />
the disadvantage is the loss of control over whether (and where) the tags bind. Attaching metal tags<br />
to a protein of interest site-specifically is one of the current challenges of paramagnetic <strong>NMR</strong>.<br />
Several groups are developing new tags, for a recent review, see (Su et al., 2009). While those tags<br />
are engineered to simplify as much as possible the process of tag attachment, a consequence of this
5.1 The use of PCS for structure determination. 133<br />
quest will hopefully be the possibility to attach them at different positions on the surface of the<br />
target protein.<br />
All work outlined in this thesis has greatly benefited from the availability of multiple data<br />
sets measured with different lanthanides. The magnitude of the paramagnetic dipole moment differs<br />
between different paramagnetic metal ions. The orientation of the Δχ-tensor varies too. The<br />
advantage is that those differences provide additional information that can be used to improve the<br />
quality of the fitted Δχ-tensor (Chapter 3) or the quality of the automated assignment (Chapter 2).<br />
PCS-ROSETTA calculations can take advantage of that fact. Especially the calculations done on<br />
calbindin greatly improved when all available lanthanide data were used compared to test<br />
calculations using PCS from only one or two lanthanides simultaneously. It can be expected that<br />
the use of tags attached at different locations on the protein will enhance the PCS-ROSETTA<br />
calculations further. In particularly, the location and the orientation of fragments with respect to the<br />
rest of the protein structure would be defined more accurately by isosurfaces intersecting at steeper<br />
angles (Figure 5.2). The current implementation of PCS-ROSETTA would make it straightforward<br />
to design such a protocol. The benefits of using two different lanthanide binding sites could be<br />
explored already using the arginine repressor as a test case, were PCS data measured for two<br />
different lanthanide binding sites are available.<br />
5.1.3 Development of a new PCS-ROSETTA protocol<br />
The way structures are calculated by PCS-ROSETTA is similar to the standard protocol of<br />
ROSETTA. The only difference is the calculation of a PCS-score during the fragment assembly<br />
stage (which requires fitting of a Δχ-tensor and metal position). The weight of the PCS-score<br />
compared relative to the standard centroid score of ROSETTA is chosen so that both have an equal<br />
influence. While ROSETTA benefits from additional experimental restraints such as PCSs, it is<br />
important to bear in mind that the original goal of ROSETTA is to generate a wide variety of<br />
protein-like structures in a first step (fragment assembly) and identify the native one in a second<br />
step (full atom score). Considering that the PCS have proven to drive the sampling towards the<br />
native structure with great efficiency, it can be questioned whether it is necessary to enforce<br />
diversity in the generated structures. Several thousands of structures (using a large amount of CPU<br />
time) are usually generated by ROSETTA to cover a wide range of possible structures. It may be<br />
more profitable to generate, at equal CPU time, a smaller number of structures for which more time<br />
is spent for the fragment assembly.
134 Chapter 5. Conclusion and perspectives.<br />
Clearly, a larger benchmark containing proteins of different topology and size is necessary<br />
to gain a better understanding of the merit of the pseudocontact shift. We are in the process of<br />
creating an artificial one, since PCS can easily be predicted. The parameters that would impact the<br />
success of calculation, and that we have to explore are: the level of noise within the data PCS-<br />
ROSETTA can tolerate, the influence of the position of the paramagnetic center relative to the<br />
protein, and the influence of the relative orientations of different Δχ-tensors.<br />
5.2 The use of PCS for chemical shift assignment<br />
The work of Chapter 2 provides proof of principle that PCS can be used for automatic<br />
chemical shift assignment when a 3D structure is available. Chemical shifts are sensitive to the<br />
local environment. Small variations in the surrounding electronic configuration can have a large<br />
impact on the chemical shift values. This makes it extremely difficult to predict chemical shifts<br />
accurately. In contrast, the PCS of a spin i is much less affected by similar variations, as the PCS<br />
only depends on the spherical coordinates of i with respect to the Δχ-tensor frame.<br />
While the program Possum presented in Chapter 2 is limited to methyl groups, the approach<br />
used to automatically assign the chemical shifts could be applied to any atom type for which PCS<br />
can be measured. The simulated annealing protocol used to solve the multi-dimensional assignment<br />
problem has proven to be highly efficient at sampling the possible assignment space and find a<br />
solution of lower energy than manual assignments. The scoring scheme used to optimize the<br />
assignment is also efficient; only few misassignments were present when multiple lanthanide data<br />
sets were available. Even more interesting may be to obtain the Δχ-tensor parameters directly from<br />
unassigned chemical shifts. The program Echidna (Schmitz et al., 2006) is capable of<br />
simultaneously getting the Δχ-tensor parameters and assigning the paramagnetic <strong>NMR</strong> spectrum,<br />
provided that the diamagnetic spectrum is already assigned. This raises the question whether both<br />
the diamagnetic and the paramagnetic spectrum can be assigned, while determining the Δχ-tensor at<br />
the same time. A software package for this purpose is currently under development. The idea is to<br />
handle various kinds of input information (partial assignment for some residues in the diamagnetic<br />
or paramagnetic state, measurement of some unassigned pseudocontact shifts, selective isotope<br />
labeling of some amino acids) and use this partial information in a simulated annealing protocol to<br />
optimize the assignment while determining the Δχ-tensor parameters. The challenge of this<br />
approach is to use the right combination of methods: simulated annealing for the assignment, grid
5.3 The use of PCS for protein docking. 135<br />
search for the coordinates of the paramagnetic center, and singular value decomposition for the<br />
determination of the remaining Δχ-tensor parameters.<br />
5.3 The use of PCS for protein docking<br />
At present, structural biology groups are investing much effort in producing models of<br />
proteins by <strong>NMR</strong> or X-ray crystallography. Currently more than 48000 crystal structures and<br />
almost 7000 <strong>NMR</strong> structures of proteins have been deposited in the protein data bank. In contrast,<br />
the number of protein-protein complexes solved by any of those methods remains low (less than<br />
2500). For X-ray crystallography, the difficulty to co-crystallize a complex is much greater than to<br />
obtain crystal structures of the individual proteins. For <strong>NMR</strong> spectroscopy, the larger molecular<br />
weight of complexes presents a problem, making it more difficult to obtain and analyze data.<br />
Additionally, the most useful information of intermolecular NOEs often involves amino acid side<br />
chains the resonances of which are much harder to assign than the backbone resonances of the<br />
protein.<br />
For a better insight of the molecular basis of life, the challenge is to understand how<br />
individual macromolecules come together to fulfill their tasks in DNA replication, gene expression<br />
and regulation, etc. Construction of models for protein-protein, protein-DNA and protein-ligand<br />
complexes are necessary. An angle to tackle the problem is the use of a docking program that<br />
predicts the binding mode of the complex given the structures of the individual components. The<br />
major difficulty of this approach is to comprehensively explore the 6-dimensional space that<br />
describes the relative orientation and position of two rigid bodies in space. This presents a<br />
challenging sampling problem. Shortage of experimental information to support any model<br />
generated is another drawback of the docking approach. Therefore, alternative techniques to<br />
provide more experimental information would be important. Measurements of residual dipolar<br />
couplings are an efficient way to obtain orientational information between two macromolecules: the<br />
macromolecules are weakly aligned using an alignment media. Independent determination of the<br />
alignment tensor with respect to the two macromolecules gives direct access to the relative<br />
orientation of the two rigid bodies. More interestingly, PCS measurements provide, in addition to<br />
orientational information, information on the distance between the two bodies. Determination of the<br />
Δχ-tensor with respect to the two molecules and superimposition of the two Δχ-tensor frames yields<br />
the relative orientation and position of the two macromolecules, as illustrated in Figure 1.8
136 Chapter 5. Conclusion and perspectives.<br />
(Chapter 1). Pseudocontact shifts therefore present particularly powerful experimental data to<br />
construct a model of the complex between two molecules by rigid body docking, and to shortcut<br />
the computationally expensive task of sampling the conformational space. PCS cannot, however,<br />
provide the high resolution accuracy that crystallography achieves and more detailed models will<br />
still require structural refinement software that optimizes the intermolecular packing and any other<br />
structural adjustments. At least, however, the computational effort of protein docking can focus on<br />
the atomic details when starting from a valid rigid body model.<br />
Another challenge facing the modeling of a protein-protein complex arises when one of the<br />
proteins undergoes a large conformational change. Protein docking software can tackle the problem<br />
to some extent by adapting the conformations of the side chains and the backbone close to the<br />
complex interface. As this approach greatly increases the conformational space to search, it is<br />
computationally challenging. Paramagnetic <strong>NMR</strong> and PCS could assist the modeling of large<br />
conformational changes by directing the conformational alterations toward the correct<br />
conformation. A protein that undergoes a large conformational change due to motions about a<br />
hinge that separates two rigid domains would present an attractive example. In a situation, where<br />
such a hinge motion occurs as a result of association with a binding partner, a technique for rigid<br />
body docking with the help of PCS as presented in this thesis could be particularly fruitfully<br />
applied to assist the modeling of the complex.<br />
5.4 References<br />
Bertini I, Longinetti M, Luchinat C, Parigi G and Sgheri L (2002) Efficiency of paramagnetism-<br />
based constraints to determine the spatial arrangement of α-helical secondary structure<br />
elements. J Biomol <strong>NMR</strong> 22:123-136<br />
Schmitz C, John M, Park AY, Dixon NE, Otting G, Pintacuda G and Huber T (2006) Efficient χ-<br />
tensor determination and NH assignment of paramagnetic proteins. J Biomol <strong>NMR</strong> 35:79-<br />
87<br />
Su XC and Otting G (2009) Paramagnetic labelling of proteins and oligonucleotides. J Biomol<br />
<strong>NMR</strong> in press