02.03.2013 Views

198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...

198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...

198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 J.P. Glusker<br />

2.1<br />

Sources of Crystallographic Data<br />

Two major requirements for mean<strong>in</strong>gful comparisons of three-dimensional<br />

structures for statistical analysis are:<br />

1. ready access to the three-dimensional coord<strong>in</strong>ates of all the appropriate<br />

crystal structure <strong>de</strong>term<strong>in</strong>ations that conta<strong>in</strong> the features chosen for comparison,<br />

and<br />

2. effective and easy-to-use analysis techniques.<br />

Fortunately, efficient computer-based crystallographic databases conta<strong>in</strong><strong>in</strong>g<br />

three-dimensional coord<strong>in</strong>ates of atoms <strong>in</strong> crystal structures reported <strong>in</strong> the<br />

scientific literature are now available to the scientific community. Therefore it is<br />

no longer necessary to type large sets of numbers <strong>in</strong>to a computer, because they<br />

can be accessed <strong>in</strong> the correct format from the databases. In addition, there are<br />

many excellent computer-graphics, geometrical, and statistical programs available<br />

for application to these three-dimensional coord<strong>in</strong>ates obta<strong>in</strong>ed from the<br />

databases. This ensures that it is possible to compare structures. It is pru<strong>de</strong>nt,<br />

however, for the <strong>in</strong>vestigator to build molecular mo<strong>de</strong>ls, as required, of the<br />

ball-and-stick or space-fill<strong>in</strong>g variety, <strong>in</strong> or<strong>de</strong>r to obta<strong>in</strong> chemical or biochemical<br />

<strong>in</strong>sight from any comparisons that have been ma<strong>de</strong>.<br />

The Cambridge Structural Database (CSD) [4] conta<strong>in</strong>s unit-cell dimensions<br />

<strong>in</strong>formation on approximately 170,000 three-dimensional crystal structure<br />

<strong>de</strong>term<strong>in</strong>ations that have been studied by X-ray or neutron diffraction. Each<br />

crystal structure is i<strong>de</strong>ntified by a unique six-letter co<strong>de</strong>, called its REFCODE.<br />

Duplicate structures and remeasurements of the same crystal structure are<br />

i<strong>de</strong>ntified by an additional two digits after the REFCODE. The CSD may be searched<br />

<strong>in</strong> several ways. It is possible to f<strong>in</strong>d a list of bibliographic references to<br />

reported data on all compounds with given chemical characteristics such as<br />

steroids or pepti<strong>de</strong>s. In particular, however, the computer software provi<strong>de</strong>d<br />

with the CSD allows one to search for a small group of atoms (either a full molecule<br />

or a fragment of a chemical structure) with bond<strong>in</strong>g that is precisely <strong>de</strong>f<strong>in</strong>ed<br />

by the co<strong>de</strong>s that are provi<strong>de</strong>d by the user as <strong>in</strong>put to the search. When<br />

groups of atoms that meet the required specifications have been extracted from<br />

the CSD, tables of required geometrical data may be generated and statistical<br />

methods applied to the results. In analyses of hydrogen bond<strong>in</strong>g the <strong>in</strong>formation<br />

on crystal structures obta<strong>in</strong>ed by neutron diffraction (<strong>in</strong> which hydrogen atoms<br />

are located with more precision than is possible for X-ray diffraction) are<br />

important.<br />

The Prote<strong>in</strong> Data Bank (PDB) [5] is a computerized archive for the threedimensional<br />

structural data on biological macromolecules – prote<strong>in</strong>s and nucleic<br />

acids. Each prote<strong>in</strong> structure reported has an i<strong>de</strong>ntify<strong>in</strong>g co<strong>de</strong> (IDCODE), a<br />

hea<strong>de</strong>r record conta<strong>in</strong><strong>in</strong>g useful <strong>in</strong>formation on the prote<strong>in</strong> such as the name<br />

and source of the prote<strong>in</strong> and the resolution of the structure, together with a<br />

series of references to published articles on the prote<strong>in</strong>. Data are <strong>in</strong>clu<strong>de</strong>d on the<br />

ref<strong>in</strong>ement methods used, such as the programs used, the R value, the number<br />

of Bragg reflections, the root-mean-square <strong>de</strong>viations of the bond lengths and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!