Bioinformatics Biocomputing - Ercim

SPECIAL THEME: BIOINFORMATICS 

Searching for New Drugs in Virtual Molecule Databases 

by Matthias Rarey and Thomas Lengauer 

The rapid progress in sequencing the human genome 

opens the possibility for the near future to understand 

many diseases better on molecular level and to obtain 

so-called target proteins for pharmaceutical research. 

If such a target protein is identified, the search for 

those molecules begins which influence the protein’s 

Searching for New Lead Structures 

The development process of a new 

medicine can be divided into three phases. 

In the first phase, the search for target 

proteins, the disease must be understood 

on molecular-biological level as far as to 

know individual proteins and their 

importance to the symptoms. Proteins are 

the essential functional units in our 

organism and can perform various tasks 

ranging from the chemical transformation 

of materials up to the transportation of 

information. The function is always 

linked with the specific binding of other 

molecules. As early as 100 years ago, 

Emil Fischer recognised the lock-and-key 

principle: Molecules that bind to each 

other are complementary to each other 

both spatially and chemically, just as only 

a specific key fits a given lock (see Figure 

1). If a relationship between the 

suppression (or reinforcement) of a 

protein function and the symptoms is 

recognised, the protein is declared to be 

a target protein. In the second phase, the 

actual drug is developed. The aim is to 

detect a molecule that binds to the target 

protein, on the one hand, thus hindering 

its function and that, on the other, has got 

further properties that are demanded for 

drugs, for example, that it is well tolerated 

and accumulates in high concentration at 

the place of action. The first step is the 

search for a lead structure - a molecule 

that binds well to the target protein and 

serves as a first proposal for the drug. 

Ideally, the lead structure binds very well 

to the target protein and can be modified 

such that the resulting molecule is suitable 

as a drug. In the third phase, the drug is 

transformed into a medicine and is tested 

in several steps to see if it is well tolerated 

and efficient. The present paper is to 

discuss the first step, ie the computerbased 

methods of searching for new lead 

structures. 

New Approaches to Screening 

Molecule Databases 

The methods of searching for drug 

molecules can be classified according to 

two criteria: the existence of a threedimensional 

structural model of the target 

protein and the size of the data set to be 

searched. If a structural model of the 

protein is available, it can be used directly 

to search for suitable drugs (structurebased 

virtual screening); ie we search for 

a key fitting a given lock. If a structural 

model is missing, the similarity to 

molecules that bind to the target protein 

is used as a measure for the suitability as 

a drug (similarity-based virtual 

screening). Here we use a given key to 

search for fitting keys without knowing 

the lock. In the end, the size of the data 

set to be searched decides on the amount 

of time to be put into the analysis of an 

individual molecule. The size ranges from 

a few hundred already preselected 

molecules via large databases of several 

millions of molecules to virtual 

combinatorial molecule libraries 

theoretically allowing to synthese of up 

to billions of molecules from some 

hundred molecule building blocks. 

The key problem in structure-based 

virtual screening is the prediction of the 

relative orientation of the target protein 

and a potential drug molecule, the socalled 

docking problem. For solving this 

problem we have developed the software 

tool FlexX [1] in co-operation with 

Merck KGaA, Darmstadt, and BASF AG, 

Ludwigshafen. On the one hand, the 

difficulty of the docking problem arises 

from the estimation of the free energy of 

a molecular complex in aqueous solution 

and, on the other, from the flexibility of 

the molecules involved. While a sufficient 

description of the flexibility of the protein 

presumably will not be possible even in 

the near future, the more important 

activity specifically and which are therefore 

considered to be potential drugs against the disease. 

At GMD, approaches to the computer-based search 

for new drugs are being developed (virtual screening) 

which have already been used by industry in parts. 

flexibility of the ligand is considered 

during a FlexX prediction. In a set of 

benchmarks tests, FlexX is able to predict 

about 70 percent of the protein-ligand 

complexes sufficiently similar to the 

experimental structure. With about 90 

seconds computing time per prediction, 

the software belongs to the fastest docking 

tools currently available. FlexX has been 

marketed since 1998 and is currently 

being used by about 100 pharmaceutical 

companies, universities and research 

institutes. 

If the three-dimensional structure of the 

target protein is not available, similaritybased 

virtual screening methods are 

applied to molecules with known binding 

properties, called the reference molecule. 

The main problem here is the structural 

alignment problem which is closely 

related to the docking problem described 

above. Here, we have to superimpose a 

potential drug molecule with the reference 

molecule so that a maximum of functional 

groups are oriented such that they can 

form the same interactions with the 

protein. Along the lines of FlexX, we have 

developed the software tool FlexS [2,3] 

for the prediction of structural alignments 

with approximately the same performance 

with respect to computing time and 

prediction quality. 

If very large data sets are to be searched 

for similar molecules, the speed of the 

alignment-based screening does not 

suffice yet. The aim is to have comparison 

operations whose computation takes by 

far less than one second. Today linear 

descriptors (bit strings or integral vectors) 

are usually applied to solve this problem. 

They store the occurrence or absence of 

characteristic properties of the molecules 

such as specific chemical fragments or 

short paths in the molecule. Once such a 

descriptor has been determined, the linear 

10 ERCIM News No. 43, October 2000

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

Bioinformatics Biocomputing - Ercim

Create successful ePaper yourself

Delete template?

Save as template?