198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...
198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...
198 Topics in Current Chemistry Editorial Board: A. de Meijere KN ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 J.P. Glusker<br />
2.1<br />
Sources of Crystallographic Data<br />
Two major requirements for mean<strong>in</strong>gful comparisons of three-dimensional<br />
structures for statistical analysis are:<br />
1. ready access to the three-dimensional coord<strong>in</strong>ates of all the appropriate<br />
crystal structure <strong>de</strong>term<strong>in</strong>ations that conta<strong>in</strong> the features chosen for comparison,<br />
and<br />
2. effective and easy-to-use analysis techniques.<br />
Fortunately, efficient computer-based crystallographic databases conta<strong>in</strong><strong>in</strong>g<br />
three-dimensional coord<strong>in</strong>ates of atoms <strong>in</strong> crystal structures reported <strong>in</strong> the<br />
scientific literature are now available to the scientific community. Therefore it is<br />
no longer necessary to type large sets of numbers <strong>in</strong>to a computer, because they<br />
can be accessed <strong>in</strong> the correct format from the databases. In addition, there are<br />
many excellent computer-graphics, geometrical, and statistical programs available<br />
for application to these three-dimensional coord<strong>in</strong>ates obta<strong>in</strong>ed from the<br />
databases. This ensures that it is possible to compare structures. It is pru<strong>de</strong>nt,<br />
however, for the <strong>in</strong>vestigator to build molecular mo<strong>de</strong>ls, as required, of the<br />
ball-and-stick or space-fill<strong>in</strong>g variety, <strong>in</strong> or<strong>de</strong>r to obta<strong>in</strong> chemical or biochemical<br />
<strong>in</strong>sight from any comparisons that have been ma<strong>de</strong>.<br />
The Cambridge Structural Database (CSD) [4] conta<strong>in</strong>s unit-cell dimensions<br />
<strong>in</strong>formation on approximately 170,000 three-dimensional crystal structure<br />
<strong>de</strong>term<strong>in</strong>ations that have been studied by X-ray or neutron diffraction. Each<br />
crystal structure is i<strong>de</strong>ntified by a unique six-letter co<strong>de</strong>, called its REFCODE.<br />
Duplicate structures and remeasurements of the same crystal structure are<br />
i<strong>de</strong>ntified by an additional two digits after the REFCODE. The CSD may be searched<br />
<strong>in</strong> several ways. It is possible to f<strong>in</strong>d a list of bibliographic references to<br />
reported data on all compounds with given chemical characteristics such as<br />
steroids or pepti<strong>de</strong>s. In particular, however, the computer software provi<strong>de</strong>d<br />
with the CSD allows one to search for a small group of atoms (either a full molecule<br />
or a fragment of a chemical structure) with bond<strong>in</strong>g that is precisely <strong>de</strong>f<strong>in</strong>ed<br />
by the co<strong>de</strong>s that are provi<strong>de</strong>d by the user as <strong>in</strong>put to the search. When<br />
groups of atoms that meet the required specifications have been extracted from<br />
the CSD, tables of required geometrical data may be generated and statistical<br />
methods applied to the results. In analyses of hydrogen bond<strong>in</strong>g the <strong>in</strong>formation<br />
on crystal structures obta<strong>in</strong>ed by neutron diffraction (<strong>in</strong> which hydrogen atoms<br />
are located with more precision than is possible for X-ray diffraction) are<br />
important.<br />
The Prote<strong>in</strong> Data Bank (PDB) [5] is a computerized archive for the threedimensional<br />
structural data on biological macromolecules – prote<strong>in</strong>s and nucleic<br />
acids. Each prote<strong>in</strong> structure reported has an i<strong>de</strong>ntify<strong>in</strong>g co<strong>de</strong> (IDCODE), a<br />
hea<strong>de</strong>r record conta<strong>in</strong><strong>in</strong>g useful <strong>in</strong>formation on the prote<strong>in</strong> such as the name<br />
and source of the prote<strong>in</strong> and the resolution of the structure, together with a<br />
series of references to published articles on the prote<strong>in</strong>. Data are <strong>in</strong>clu<strong>de</strong>d on the<br />
ref<strong>in</strong>ement methods used, such as the programs used, the R value, the number<br />
of Bragg reflections, the root-mean-square <strong>de</strong>viations of the bond lengths and