15.01.2013 Views

Bioinformatics Biocomputing - Ercim

Bioinformatics Biocomputing - Ercim

Bioinformatics Biocomputing - Ercim

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SPECIAL THEME: BIOINFORMATICS<br />

Searching for New Drugs in Virtual Molecule Databases<br />

by Matthias Rarey and Thomas Lengauer<br />

The rapid progress in sequencing the human genome<br />

opens the possibility for the near future to understand<br />

many diseases better on molecular level and to obtain<br />

so-called target proteins for pharmaceutical research.<br />

If such a target protein is identified, the search for<br />

those molecules begins which influence the protein’s<br />

Searching for New Lead Structures<br />

The development process of a new<br />

medicine can be divided into three phases.<br />

In the first phase, the search for target<br />

proteins, the disease must be understood<br />

on molecular-biological level as far as to<br />

know individual proteins and their<br />

importance to the symptoms. Proteins are<br />

the essential functional units in our<br />

organism and can perform various tasks<br />

ranging from the chemical transformation<br />

of materials up to the transportation of<br />

information. The function is always<br />

linked with the specific binding of other<br />

molecules. As early as 100 years ago,<br />

Emil Fischer recognised the lock-and-key<br />

principle: Molecules that bind to each<br />

other are complementary to each other<br />

both spatially and chemically, just as only<br />

a specific key fits a given lock (see Figure<br />

1). If a relationship between the<br />

suppression (or reinforcement) of a<br />

protein function and the symptoms is<br />

recognised, the protein is declared to be<br />

a target protein. In the second phase, the<br />

actual drug is developed. The aim is to<br />

detect a molecule that binds to the target<br />

protein, on the one hand, thus hindering<br />

its function and that, on the other, has got<br />

further properties that are demanded for<br />

drugs, for example, that it is well tolerated<br />

and accumulates in high concentration at<br />

the place of action. The first step is the<br />

search for a lead structure - a molecule<br />

that binds well to the target protein and<br />

serves as a first proposal for the drug.<br />

Ideally, the lead structure binds very well<br />

to the target protein and can be modified<br />

such that the resulting molecule is suitable<br />

as a drug. In the third phase, the drug is<br />

transformed into a medicine and is tested<br />

in several steps to see if it is well tolerated<br />

and efficient. The present paper is to<br />

discuss the first step, ie the computerbased<br />

methods of searching for new lead<br />

structures.<br />

New Approaches to Screening<br />

Molecule Databases<br />

The methods of searching for drug<br />

molecules can be classified according to<br />

two criteria: the existence of a threedimensional<br />

structural model of the target<br />

protein and the size of the data set to be<br />

searched. If a structural model of the<br />

protein is available, it can be used directly<br />

to search for suitable drugs (structurebased<br />

virtual screening); ie we search for<br />

a key fitting a given lock. If a structural<br />

model is missing, the similarity to<br />

molecules that bind to the target protein<br />

is used as a measure for the suitability as<br />

a drug (similarity-based virtual<br />

screening). Here we use a given key to<br />

search for fitting keys without knowing<br />

the lock. In the end, the size of the data<br />

set to be searched decides on the amount<br />

of time to be put into the analysis of an<br />

individual molecule. The size ranges from<br />

a few hundred already preselected<br />

molecules via large databases of several<br />

millions of molecules to virtual<br />

combinatorial molecule libraries<br />

theoretically allowing to synthese of up<br />

to billions of molecules from some<br />

hundred molecule building blocks.<br />

The key problem in structure-based<br />

virtual screening is the prediction of the<br />

relative orientation of the target protein<br />

and a potential drug molecule, the socalled<br />

docking problem. For solving this<br />

problem we have developed the software<br />

tool FlexX [1] in co-operation with<br />

Merck KGaA, Darmstadt, and BASF AG,<br />

Ludwigshafen. On the one hand, the<br />

difficulty of the docking problem arises<br />

from the estimation of the free energy of<br />

a molecular complex in aqueous solution<br />

and, on the other, from the flexibility of<br />

the molecules involved. While a sufficient<br />

description of the flexibility of the protein<br />

presumably will not be possible even in<br />

the near future, the more important<br />

activity specifically and which are therefore<br />

considered to be potential drugs against the disease.<br />

At GMD, approaches to the computer-based search<br />

for new drugs are being developed (virtual screening)<br />

which have already been used by industry in parts.<br />

flexibility of the ligand is considered<br />

during a FlexX prediction. In a set of<br />

benchmarks tests, FlexX is able to predict<br />

about 70 percent of the protein-ligand<br />

complexes sufficiently similar to the<br />

experimental structure. With about 90<br />

seconds computing time per prediction,<br />

the software belongs to the fastest docking<br />

tools currently available. FlexX has been<br />

marketed since 1998 and is currently<br />

being used by about 100 pharmaceutical<br />

companies, universities and research<br />

institutes.<br />

If the three-dimensional structure of the<br />

target protein is not available, similaritybased<br />

virtual screening methods are<br />

applied to molecules with known binding<br />

properties, called the reference molecule.<br />

The main problem here is the structural<br />

alignment problem which is closely<br />

related to the docking problem described<br />

above. Here, we have to superimpose a<br />

potential drug molecule with the reference<br />

molecule so that a maximum of functional<br />

groups are oriented such that they can<br />

form the same interactions with the<br />

protein. Along the lines of FlexX, we have<br />

developed the software tool FlexS [2,3]<br />

for the prediction of structural alignments<br />

with approximately the same performance<br />

with respect to computing time and<br />

prediction quality.<br />

If very large data sets are to be searched<br />

for similar molecules, the speed of the<br />

alignment-based screening does not<br />

suffice yet. The aim is to have comparison<br />

operations whose computation takes by<br />

far less than one second. Today linear<br />

descriptors (bit strings or integral vectors)<br />

are usually applied to solve this problem.<br />

They store the occurrence or absence of<br />

characteristic properties of the molecules<br />

such as specific chemical fragments or<br />

short paths in the molecule. Once such a<br />

descriptor has been determined, the linear<br />

10 ERCIM News No. 43, October 2000

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!