Bioinformatics Biocomputing - Ercim
Bioinformatics Biocomputing - Ercim
Bioinformatics Biocomputing - Ercim
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
SPECIAL THEME: BIOINFORMATICS<br />
Searching for New Drugs in Virtual Molecule Databases<br />
by Matthias Rarey and Thomas Lengauer<br />
The rapid progress in sequencing the human genome<br />
opens the possibility for the near future to understand<br />
many diseases better on molecular level and to obtain<br />
so-called target proteins for pharmaceutical research.<br />
If such a target protein is identified, the search for<br />
those molecules begins which influence the protein’s<br />
Searching for New Lead Structures<br />
The development process of a new<br />
medicine can be divided into three phases.<br />
In the first phase, the search for target<br />
proteins, the disease must be understood<br />
on molecular-biological level as far as to<br />
know individual proteins and their<br />
importance to the symptoms. Proteins are<br />
the essential functional units in our<br />
organism and can perform various tasks<br />
ranging from the chemical transformation<br />
of materials up to the transportation of<br />
information. The function is always<br />
linked with the specific binding of other<br />
molecules. As early as 100 years ago,<br />
Emil Fischer recognised the lock-and-key<br />
principle: Molecules that bind to each<br />
other are complementary to each other<br />
both spatially and chemically, just as only<br />
a specific key fits a given lock (see Figure<br />
1). If a relationship between the<br />
suppression (or reinforcement) of a<br />
protein function and the symptoms is<br />
recognised, the protein is declared to be<br />
a target protein. In the second phase, the<br />
actual drug is developed. The aim is to<br />
detect a molecule that binds to the target<br />
protein, on the one hand, thus hindering<br />
its function and that, on the other, has got<br />
further properties that are demanded for<br />
drugs, for example, that it is well tolerated<br />
and accumulates in high concentration at<br />
the place of action. The first step is the<br />
search for a lead structure - a molecule<br />
that binds well to the target protein and<br />
serves as a first proposal for the drug.<br />
Ideally, the lead structure binds very well<br />
to the target protein and can be modified<br />
such that the resulting molecule is suitable<br />
as a drug. In the third phase, the drug is<br />
transformed into a medicine and is tested<br />
in several steps to see if it is well tolerated<br />
and efficient. The present paper is to<br />
discuss the first step, ie the computerbased<br />
methods of searching for new lead<br />
structures.<br />
New Approaches to Screening<br />
Molecule Databases<br />
The methods of searching for drug<br />
molecules can be classified according to<br />
two criteria: the existence of a threedimensional<br />
structural model of the target<br />
protein and the size of the data set to be<br />
searched. If a structural model of the<br />
protein is available, it can be used directly<br />
to search for suitable drugs (structurebased<br />
virtual screening); ie we search for<br />
a key fitting a given lock. If a structural<br />
model is missing, the similarity to<br />
molecules that bind to the target protein<br />
is used as a measure for the suitability as<br />
a drug (similarity-based virtual<br />
screening). Here we use a given key to<br />
search for fitting keys without knowing<br />
the lock. In the end, the size of the data<br />
set to be searched decides on the amount<br />
of time to be put into the analysis of an<br />
individual molecule. The size ranges from<br />
a few hundred already preselected<br />
molecules via large databases of several<br />
millions of molecules to virtual<br />
combinatorial molecule libraries<br />
theoretically allowing to synthese of up<br />
to billions of molecules from some<br />
hundred molecule building blocks.<br />
The key problem in structure-based<br />
virtual screening is the prediction of the<br />
relative orientation of the target protein<br />
and a potential drug molecule, the socalled<br />
docking problem. For solving this<br />
problem we have developed the software<br />
tool FlexX [1] in co-operation with<br />
Merck KGaA, Darmstadt, and BASF AG,<br />
Ludwigshafen. On the one hand, the<br />
difficulty of the docking problem arises<br />
from the estimation of the free energy of<br />
a molecular complex in aqueous solution<br />
and, on the other, from the flexibility of<br />
the molecules involved. While a sufficient<br />
description of the flexibility of the protein<br />
presumably will not be possible even in<br />
the near future, the more important<br />
activity specifically and which are therefore<br />
considered to be potential drugs against the disease.<br />
At GMD, approaches to the computer-based search<br />
for new drugs are being developed (virtual screening)<br />
which have already been used by industry in parts.<br />
flexibility of the ligand is considered<br />
during a FlexX prediction. In a set of<br />
benchmarks tests, FlexX is able to predict<br />
about 70 percent of the protein-ligand<br />
complexes sufficiently similar to the<br />
experimental structure. With about 90<br />
seconds computing time per prediction,<br />
the software belongs to the fastest docking<br />
tools currently available. FlexX has been<br />
marketed since 1998 and is currently<br />
being used by about 100 pharmaceutical<br />
companies, universities and research<br />
institutes.<br />
If the three-dimensional structure of the<br />
target protein is not available, similaritybased<br />
virtual screening methods are<br />
applied to molecules with known binding<br />
properties, called the reference molecule.<br />
The main problem here is the structural<br />
alignment problem which is closely<br />
related to the docking problem described<br />
above. Here, we have to superimpose a<br />
potential drug molecule with the reference<br />
molecule so that a maximum of functional<br />
groups are oriented such that they can<br />
form the same interactions with the<br />
protein. Along the lines of FlexX, we have<br />
developed the software tool FlexS [2,3]<br />
for the prediction of structural alignments<br />
with approximately the same performance<br />
with respect to computing time and<br />
prediction quality.<br />
If very large data sets are to be searched<br />
for similar molecules, the speed of the<br />
alignment-based screening does not<br />
suffice yet. The aim is to have comparison<br />
operations whose computation takes by<br />
far less than one second. Today linear<br />
descriptors (bit strings or integral vectors)<br />
are usually applied to solve this problem.<br />
They store the occurrence or absence of<br />
characteristic properties of the molecules<br />
such as specific chemical fragments or<br />
short paths in the molecule. Once such a<br />
descriptor has been determined, the linear<br />
10 ERCIM News No. 43, October 2000