11.07.2015 Views

A comparison of heuristic search algorithms for molecular docking

A comparison of heuristic search algorithms for molecular docking

A comparison of heuristic search algorithms for molecular docking

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A <strong>comparison</strong> <strong>of</strong> <strong>heuristic</strong> <strong>search</strong><strong>algorithms</strong> <strong>for</strong> <strong>molecular</strong> <strong>docking</strong>By David Westhead, David Clark and ChristopherMurrayPublished in: Journal <strong>of</strong> Computer-AidedMolecular Design, 11 (1997) 209-228Seminar talk 17.10.2005Kerstin Kunz (K.Kunz@bioinf.uni-sb.de)AbstractThis work describes the implementation and<strong>comparison</strong> <strong>of</strong> four different <strong>heuristic</strong> <strong>search</strong><strong>algorithms</strong> and a random <strong>search</strong> procedure w. r. t.their per<strong>for</strong>mance on five test cases <strong>for</strong> <strong>molecular</strong><strong>docking</strong>.IntroductionDocking tries to find the energetically most feasiblethree-dimensional arrangement <strong>of</strong> two molecules inclose contact with each other. Molecular <strong>docking</strong> is<strong>of</strong> great interest in the development <strong>of</strong>pharmaceutical agents. The selective recognition <strong>of</strong>the drug molecule by its appropriate target receptoris crucial <strong>for</strong> a save and effective action <strong>of</strong> a drugwithin the body. To discover the binding geometryand affinity <strong>of</strong> two molecules without expensiveand time consuming experimental techniques likesynthesis, co-crystallization and assay, couldsignificantly speed up the early stages <strong>of</strong> drugdiscovery. There exist several algorithmicsolutions. Whereas the rigid body <strong>docking</strong>considers only translational and rotational degrees<strong>of</strong> freedom <strong>of</strong> the ligand, this method isn’tappropriate to receptor-ligand complexes withcon<strong>for</strong>mational changes during the binding process.A more sophisticated solution is provided byflexible <strong>docking</strong> <strong>algorithms</strong>, which allow an internalcon<strong>for</strong>mational flexibility <strong>of</strong> the ligand. But withincreasing degrees <strong>of</strong> freedom, the size <strong>of</strong> the<strong>search</strong> space grows exponentially. Hence, fast andeffective <strong>search</strong> <strong>algorithms</strong> are needed! So far, theonly acceptable solutions are given by <strong>heuristic</strong><strong>search</strong> <strong>algorithms</strong>. A <strong>heuristic</strong> is an optimizationtechnique that sometimes will work, but not always.It might help us to find solutions that are good, butnot necessarily optimal. The basic idea <strong>of</strong> <strong>heuristic</strong><strong>search</strong> is that, rather than trying all possible <strong>search</strong>paths, you try and focus on paths that seem to takeyou nearer to the goal.Problem representationTo restrict the <strong>search</strong> space, there must be somesimplifications w. r. t. the degrees <strong>of</strong> freedom.Neither receptor nor ligand can be considered fullyflexible. Regarding the ligand, there are onlyrotations about rotatable bonds allowed. Thereceptor has some limited flexibility at the activesite, where it is restricted to a box within which theligand is moved w. r. t. translation, orientation andinternal flexibility. The <strong>docking</strong> variables representthe relative position <strong>of</strong> ligand and receptor as wellas their internal con<strong>for</strong>mations that are representedby an internal coordinate tree. They are attached tothe ligand. At the root <strong>of</strong> the tree is a triplet <strong>of</strong>virtual atoms that serves as reference <strong>for</strong> bondlengths, valence and torsion angles. The variablesare stored as a string <strong>of</strong> real numbers, on which thealterations are applied to change the ligandsorientation. There must be a measure to evaluate thequality <strong>of</strong> the docked complexes. As <strong>docking</strong> iscoupled with a minimization problem, that meansthat the ligand is always trying to be in a state <strong>of</strong>lowest energy, an energy function is applied toscore the results. The idea is that minimum valuesshould correspond to the preferred binding mode <strong>of</strong>the ligand and that there is a correlation between thevalues <strong>of</strong> the energy function and the bindingaffinity <strong>of</strong> the ligand.Search <strong>algorithms</strong>Evolutionary programming and genetic <strong>algorithms</strong>imitate the natural process <strong>of</strong> evolution to solve aproblem. Both are similar, they start with apopulation <strong>of</strong> individuals which properties arerepresented by a string <strong>of</strong> variables. This string ismodified until no better solutions are found or nosignificant change is observed. The difference isthat evolutionary programming models only thebehavioral linkage between parents and their<strong>of</strong>fspring, while genetic <strong>algorithms</strong> apply alsorecombination between different parents to createnew <strong>of</strong>fspring. The disadvantage <strong>of</strong> these<strong>algorithms</strong> is the risk to get trapped in localminima.Simulated annealing is a method <strong>of</strong> solvingminimization problems with a very large number <strong>of</strong>free parameters, typically with an objective functionthat can be evaluated quickly. It imitates thegrowing <strong>of</strong> a crystal. To grow a crystal, one beginsby heating the raw materials to a liquid state. Thismolten material is then slowly cooled, until thecrystal structure is frozen in. If the temperature isdecreased too quickly, defects in the crystalstructure develop. A slow temperature decreaseallows these defects to “work themselves out”<strong>for</strong>ming a much better crystal. This process is


called annealing. A perfect crystal contains theminimum <strong>of</strong> energy <strong>of</strong> all the final possibilities.Simulated annealing works just like a local <strong>search</strong>by examining exchanged solutions and alwaystaking the better solution. The moves are generatedrandomly and so they may cause the objectivefunction to increase, to decrease or to remainunchanged. By analogy with the physical process,the temperature is initially high and so does theprobability <strong>of</strong> accepting a move increasing theobjective function and barriers can be overcomeeasily. The temperature is gradually decreased asthe <strong>search</strong> progresses. So the system is cooledslowly and in the end, the probability <strong>of</strong> accepting amove that increases the objective function becomesvanishing small. In general the temperature islowered in accordance with an annealing schedule.Applying the Metropolis criterion, decreasingmoves are always accepted and increasing movesare accepted with probability e −ΔE T .Tabu <strong>search</strong> was <strong>for</strong>mally known in operationsre<strong>search</strong> and combinatorial optimization problems.€Here, tabu <strong>search</strong> is applied to the <strong>docking</strong> problem<strong>for</strong> the first time. While other <strong>algorithms</strong> likeGenetic algorithm risk to get trapped in localminima, preventing perpetually visiting alreadyknown energetically favorable con<strong>for</strong>mationsprevents this in tabu <strong>search</strong>. This is realized byapplying a “tabu list” to store previously foundsolutions. Only one current solution is maintainedduring the course <strong>of</strong> a <strong>search</strong>. An initial solution ischosen at the start <strong>of</strong> the run, evaluated and addedto the tabu list. Then a user-defined number <strong>of</strong>moves are generated. A move describes a mutationlike procedure on the <strong>docking</strong> variables. Thescoring is done by the energy function. The movesare examined in rank order. A move is consideredtabu if it generates a solution not sufficientlydifferent from the solutions in the tabu list. Thehighest-ranking move is always accepted if theenergy <strong>of</strong> the current solution is lower than theenergy <strong>of</strong> the best solution so far. Otherwise thebest non-tabu move is chosen. The algorithmterminates if neither <strong>of</strong> these criteria can be met.New current solutions are added to the tabu listsimply at the end until the list is full, afterwardsreplacing already existing solutions. Each but thenext iteration <strong>of</strong> the <strong>search</strong> procedure starts fromthe solutions found in the previous iteration. If nosignificant change to the best solution occurs duringa defined number <strong>of</strong> iterations, the algorithmrestarts from a random position. Else the algorithmterminates with the best solution found.As part <strong>of</strong> the basis <strong>of</strong> <strong>comparison</strong> the authorsimplemented a random <strong>search</strong> procedure as control.It was expected that this would per<strong>for</strong>m poorer thanthe more sophisticated <strong>search</strong> <strong>algorithms</strong>. Because<strong>of</strong> the randomization <strong>of</strong> start variables it wasimportant to evaluate a statistical per<strong>for</strong>mance overa sufficiently large number <strong>of</strong> independent trials.An important factor is the characteristics <strong>of</strong> theenergy distribution <strong>of</strong> the results containing theaverage energy <strong>of</strong> a solution and the width <strong>of</strong>distribution around this average.The ideal case is a single energy minimum wherethe algorithm produces median energies close to thevalue <strong>of</strong> this minimum and a narrow distribution <strong>of</strong>results around this value. As the real world is not sosimple and <strong>of</strong>ten there is more than one minimum,another criterion is introduced, the rms distance <strong>of</strong>the <strong>docking</strong> result to the known crystal structure.This helps to classify competing minima if there ismore than one possible solution. Solutions that liewithin a distance <strong>of</strong> 1.5 Angstrom <strong>of</strong> thecrystallographic ligand con<strong>for</strong>mation are counted inthe success rate. The test cases describe relevantproblems in computer aided <strong>molecular</strong> design. Theyare <strong>of</strong> varying difficulty and reflect different aspects<strong>of</strong> <strong>molecular</strong> recognition.ResultsDHFR-MTX is a standard test case <strong>for</strong> <strong>docking</strong> andwas also used in the parameterization <strong>of</strong> the energyfunction. What it makes an easy case is that there isa single deep minimum near the crystal structure. Itwas shown that success rate and median energywere not correlated. Here, the GA was trapped inlocal minima farther than 1.5 Angstrom from thecrystal structure.Neuraminidase-DANA was chosen because <strong>of</strong> theelectrostatic and hydrogen bonds considered todominate recognition. The success rates were muchlower than in the first example. Reason <strong>for</strong> thelower success rate is a more complicated energysurface, because there is possibly more than oneminimum. TS found two dominant clusters <strong>of</strong> lowenergysolutions and proved to be more effective atglobal <strong>search</strong>. It was also noticed that there weredeficiencies in the energy function which had astrong bias towards steric fit.HIV-1 protease and XK263 make lipophilicinteractions that is particularly important <strong>for</strong> goodbinding regarding the steric fit. Here, SA per<strong>for</strong>medbest at median energy and success rate, so there wasa correlation observed <strong>for</strong> the first time. The<strong>algorithms</strong> showed no big differences in medianenergy, but the semi-interquartile range suggested astatistical difference. This is due to a greater spread<strong>of</strong> good solutions around the crystal con<strong>for</strong>mationbecause <strong>of</strong> the large molecule with less directionalnature <strong>of</strong> binding in the active site. E. g., theorientation <strong>of</strong> the ring systems has greater effect onthe calculated rms than on the value <strong>of</strong> the energyfunction.The next two examples have both strong lipophilicand electrostatic interaction in different bindingpockets. Thus the thrombin examples provide agood test <strong>of</strong> the ability <strong>of</strong> the <strong>docking</strong> potential


function to differentiate between the two types <strong>of</strong>interaction.Thrombin-NAPAP produced the lowest rate <strong>of</strong>successful solutions. But in terms <strong>of</strong> the authors’criterion <strong>for</strong> success, this example is an exception.The lowest energy is not related to the lowest rms.There are three clusters <strong>of</strong> solutions found. The firstis close to the crystallographic minimum. Thesecond cluster is farther from the crystal structurebut favored by the scoring function. Here are lowerenergy values than <strong>for</strong> the first cluster found thatmislead TS more <strong>of</strong>ten from the correct solutionthan the other <strong>algorithms</strong>. The third cluster reflectsagain a weakness <strong>of</strong> the energy function byfavoring steric fit over salt bridges.Thrombin-argatroban showed some correlationbetween median energy and success rate, thoughthe success rates are relatively low compared to theother examples. A reason can be competing minimaon the energy surface. GA, EP and SA show atendency to get trapped into a deep local minimumwhile TS can escape and do global <strong>search</strong>ing, whichmay be due to its random restart procedure. As thisexample turned out to be a difficult case, becausethere wasn’t a single low-energy minimum easily tolocate, the authors decided to rerun the <strong>docking</strong>procedure with an increasing number <strong>of</strong> rotatablebonds. The first five rather terminal bonds showedno great effect with the exception <strong>of</strong> RS that wasseriously compromised. This indicates that thenumber <strong>of</strong> rotatable bonds is an indicator <strong>for</strong> thesize <strong>of</strong> the <strong>search</strong> space but not <strong>for</strong> its difficulty.Introducing bond 6 and 7 where bigger parts <strong>of</strong> themolecule were rotated makes evident that thedifficulty depends on the presence and the character<strong>of</strong> competing low energy minima.Summaryoriginal crystallographic con<strong>for</strong>mation. That refersagain to the deficiencies <strong>of</strong> the energy function thatis biased towards steric fitting and gives nosufficient weight to salt bridges or contains sometoo large energy terms. Here, the <strong>docking</strong> wasper<strong>for</strong>med with well-known structures. For thefuture when the procedures will be applied <strong>for</strong><strong>docking</strong> ligands with unknown bound con<strong>for</strong>mationit is crucial to have a reliable energy function. Sothere is an investigation <strong>of</strong> various <strong>docking</strong> energyfunctions needed to learn about their ability topredict binding mode and affinity correctly. It alsoturned out that not the number but the situation <strong>of</strong>the rotatable bonds is crucial. A rotatable bond in ahinge position will make more difficulties than onein a terminal position. This leads to competingminima that make a <strong>docking</strong> case difficult.A critical point is, that the authors used the lowestenergy as scoring criterion. The realistic position<strong>of</strong> a ligand isn’t necessarily to find at the globalenergy minimum, e. g. if this is in the core <strong>of</strong> theprotein. Generally, it is difficult to define a criterionby that <strong>docking</strong> results can be evaluated correctly.OutlookPossible future directions are:1. An improved GA without getting trapped in localminima, e. g. by introducing a greater geneticdiversity.2. TS could be adapted to a more local <strong>search</strong>behavior in the end <strong>of</strong> the <strong>search</strong> run, e. g. byscaling down the tabu threshold or concomitantlyreducing the length <strong>of</strong> the tabu list during the<strong>docking</strong> run.3. A combination <strong>of</strong> <strong>heuristic</strong> <strong>search</strong> <strong>algorithms</strong>could join global and local <strong>search</strong> qualities.4. As tabu <strong>search</strong> turned out to be very effective, itcould be applied in other <strong>search</strong> and optimizationprocedures.It was obvious that random <strong>search</strong> per<strong>for</strong>med verypoorly because <strong>of</strong> the size <strong>of</strong> the <strong>search</strong> space. All<strong>heuristic</strong> methods were effective and gavesatisfactory per<strong>for</strong>mance. GA is a very effectivelocal <strong>search</strong> algorithm but has tendencies to gettrapped in low-energy local minima. TS samplesthe global minimum more frequently, but localregions less deeply. The energy function has a biginfluence on the results.DiscussionTS and GA per<strong>for</strong>med best within different aspects.While TS headed the table at <strong>docking</strong> success ratethat is joined with the rms distance to the “correctanswer” known from the original complex, GA wasbest w. r. t. the median energy <strong>of</strong> a solution. It mustbe said that in these examples, the lowest energiesemerging from successful <strong>docking</strong> runs aregenerally much better than that obtained <strong>for</strong> the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!