28.12.2013 Views

Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...

Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...

Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

-__-_<br />

. _. ,-. ,.‘.‘,.‘.‘.., iv’-,‘-, -, ;~-“-~;,..~ .,.,, -$... ;- ., ,~ ;<br />

: . . . ..-.I ..-. -;.5 ~_ :_ _<br />

-!<br />

that <strong>the</strong> H’s cm be packed <strong>in</strong>to a perfect cube (which <strong>is</strong><br />

<strong>the</strong> m<strong>in</strong>imum energy configuration for a collection of H’s<br />

without any str<strong>in</strong>g constra<strong>in</strong>ts) <strong>is</strong> <strong>NP</strong>-complete. The<br />

proof uses a transformation from a special form of <strong>the</strong><br />

BIN PACKING problem [u], which we call MODIFIED<br />

BIN PACKING. In particular, given an <strong>in</strong>stance B of<br />

<strong>the</strong> MODIFIED BIN PACKING problem, we show how<br />

to construct a sequence S with n3 H’s so that <strong>the</strong>re <strong>is</strong><br />

a conformation of S where <strong>the</strong> H’s form an n x n x n<br />

cube if and only if <strong>the</strong>re <strong>is</strong> a solution to B.<br />

Our <strong>NP</strong>-completeness proof relies heavily on <strong>the</strong> fact<br />

that nodes along <strong>the</strong> edges of a perfect cube have two<br />

or three neighbors not <strong>in</strong> <strong>the</strong> cube, whereas nodes on a<br />

face of a cube (but not on an edge) have just one neighbor<br />

not <strong>in</strong> <strong>the</strong> cube. The proof does not rely on parity<br />

arguments that are derived from <strong>the</strong> fact that <strong>the</strong> cubic<br />

lattice <strong>is</strong> bipartite. The <strong>NP</strong>-completeness proof does<br />

not immediately extend to general lattices, although our<br />

methods may be helpful <strong>in</strong> understand<strong>in</strong>g <strong>the</strong> complexity<br />

of <strong>the</strong> prote<strong>in</strong> fold<strong>in</strong>g problem <strong>in</strong> o<strong>the</strong>r models.<br />

The d<strong>is</strong>covery that prote<strong>in</strong> fold<strong>in</strong>g <strong>in</strong> <strong>the</strong> <strong>HP</strong> model<br />

<strong>is</strong> <strong>NP</strong>-complete lends fur<strong>the</strong>r importance to <strong>the</strong> study of<br />

approbation algorithms for prote<strong>in</strong> fold<strong>in</strong>g. Hart and<br />

Istrail [l2] have achieved approbation algorithms for<br />

<strong>the</strong> <strong>HP</strong> model on <strong>the</strong> cubic and square lattices. Their<br />

algorithms obta<strong>in</strong> an approximation factor of 3/S for <strong>the</strong><br />

cubic lattice and l/4 for <strong>the</strong> square lattice. Recently,<br />

Hart and Istrail 11.51 have presented approximation algorithms<br />

for off-lattice and side cha<strong>in</strong> variants of <strong>the</strong><br />

<strong>HP</strong> model. Agarwala et al. [1] have shown that performance<br />

guarantees of roughly 60% can be achieved for<br />

<strong>the</strong> <strong>HP</strong> model on <strong>the</strong> hexagonal close packed lattice.<br />

Th<strong>is</strong> model has <strong>the</strong> advantage that <strong>the</strong>re are no parity<br />

constraiuts imposed by <strong>the</strong> lattice.<br />

The journal version of th<strong>is</strong> paper appears <strong>in</strong> [3].<br />

2 Problem Statement<br />

The prote<strong>in</strong> fold<strong>in</strong>g problem cons<strong>is</strong>ts of embedd<strong>in</strong>g a<br />

given f<strong>in</strong>ite polypeptide sequence S of length N <strong>in</strong>to a<br />

given fixed <strong>in</strong>f<strong>in</strong>ite graph G. In th<strong>is</strong> paper, <strong>the</strong> graph G<br />

will primarily be <strong>the</strong> 34imetional cubic lattice 23. A<br />

fold of S <strong>in</strong> G <strong>is</strong> an <strong>in</strong>jective mapp<strong>in</strong>g from [l, . . _, Xl<br />

t.o G such that adjacent <strong>in</strong>tegers map to adjacent nodes<br />

of G. Each node of 23 has six neighbors. The energy<br />

of a fold of S <strong>in</strong> G to be m<strong>in</strong>imized <strong>is</strong> <strong>the</strong> negation of<br />

<strong>the</strong> number of H-H bonds <strong>in</strong> <strong>the</strong> fold, where a bond <strong>is</strong><br />

a pair of symbols mapped to adjacent nodes.<br />

In <strong>the</strong> traditional way of look<strong>in</strong>g at th<strong>is</strong> problem,<br />

successive pairs of H’s <strong>in</strong> S are not counted as form<strong>in</strong>g<br />

a bond when comput<strong>in</strong>g energy; thus, a given H has<br />

at most four neighbor<strong>in</strong>g H’s. For simplicity, we will<br />

<strong>in</strong>clude all H-H bonds <strong>in</strong> <strong>the</strong> energy calculation <strong>in</strong> th<strong>is</strong><br />

paper- Our choice of convention does not make any<br />

dXerence <strong>in</strong> terms of <strong>the</strong> <strong>NP</strong>-completeness result s<strong>in</strong>ce<br />

<strong>the</strong> number of H-H bonds <strong>in</strong> <strong>the</strong> sequence does not<br />

depend <strong>in</strong> any way on <strong>the</strong> fold.<br />

Ano<strong>the</strong>r equivalent way of look<strong>in</strong>g at <strong>the</strong> traditional<br />

problem <strong>is</strong> to m<strong>in</strong>imize <strong>the</strong> number of non-H’s among<br />

<strong>the</strong> four neighbors of each H. In th<strong>is</strong> framework, our<br />

ma<strong>in</strong> result <strong>is</strong> that <strong>the</strong> problem of determ<strong>in</strong><strong>in</strong>g whe<strong>the</strong>r<br />

or not <strong>the</strong>re <strong>is</strong> a fold <strong>in</strong> <strong>the</strong> <strong>HP</strong>-model on <strong>the</strong> 3-d lattice<br />

with a total of S non-H neighbors (<strong>the</strong> absolute m<strong>in</strong>imum)<br />

<strong>is</strong> <strong>NP</strong>-hard. We achieve hardness of appro-xima-<br />

tion results only for th<strong>is</strong> latter problem. The hardness<br />

of (9/S - c)-approximation <strong>is</strong> straightforward. (We believe<br />

it <strong>is</strong> possible to extend our results to hardness of<br />

N’ times optimal also, but we do not address th<strong>is</strong> <strong>is</strong>sue<br />

<strong>in</strong> <strong>the</strong> current paper.)<br />

We def<strong>in</strong>e <strong>the</strong> follow<strong>in</strong>g dec<strong>is</strong>ion version of <strong>the</strong> prote<strong>in</strong><br />

fold<strong>in</strong>g problem.<br />

<strong>HP</strong> STRING-FOLD<br />

Instance: A f<strong>in</strong>ite sequence S over <strong>the</strong> alphabet {H, P},<br />

an <strong>in</strong>teger m, and a graph G.<br />

Question: Is <strong>the</strong>re a fold (i.e., a self-avoid<strong>in</strong>g walk)<br />

of S <strong>in</strong> G where <strong>the</strong> number of H-H bonds <strong>is</strong> at least<br />

m?<br />

Our ma<strong>in</strong> result <strong>is</strong> that <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>complete<br />

when G <strong>is</strong> 23. The proof follows by show<strong>in</strong>g<br />

that <strong>the</strong> follow<strong>in</strong>g fold<strong>in</strong>g problem <strong>is</strong> <strong>NP</strong>-hard.<br />

PERFECT <strong>HP</strong> STRING-FOLD<br />

Instance: An <strong>in</strong>teger n and a f<strong>in</strong>ite sequence S over<br />

<strong>the</strong> alphabet {H, P} which conta<strong>in</strong>s n3 H’s.<br />

Question: Is <strong>the</strong>re a fold of S <strong>in</strong> 23 for which <strong>the</strong><br />

H’s are perfectly packed <strong>in</strong>to an n x n x n cube?<br />

The proof that PERFECT <strong>HP</strong> STRING-FOLD <strong>is</strong><br />

<strong>NP</strong>-hard <strong>in</strong>volves a transformation from <strong>the</strong> (strongly)<br />

<strong>NP</strong>-complete problem of BIN PACKING, which <strong>is</strong> def<strong>in</strong>ed<br />

as follows by Garey and Johnson [ll].<br />

BIN PACKING<br />

Instance: A f<strong>in</strong>ite set U of items, a size s(u) E 2+<br />

for each u E U, a positive <strong>in</strong>teger bm capacity B, ad a<br />

positive <strong>in</strong>teger K.<br />

Question: Is <strong>the</strong>re a partition of U <strong>in</strong>to d<strong>is</strong>jo<strong>in</strong>t<br />

sets Ul,Uz,..., UK such that <strong>the</strong> sum of <strong>the</strong> sizes of <strong>the</strong><br />

items <strong>in</strong> each U; <strong>is</strong> B or less?<br />

To simplify matters, we will use <strong>the</strong> follow<strong>in</strong>g va.ri-<br />

ation of BIN PACKING which <strong>is</strong> easily shown to be<br />

strongly <strong>NP</strong>-complete.<br />

MODIFIED BIN PACKING<br />

Instance: A f<strong>in</strong>ite set U of items, a size s(u) that<br />

<strong>is</strong> a positive even <strong>in</strong>teger for each u E U, a positive<br />

<strong>in</strong>teger b<strong>in</strong> capacity B, and a positive <strong>in</strong>teger K, where<br />

cdw;<br />

BK-<br />

: Is <strong>the</strong>re a partition of U <strong>in</strong>to d<strong>is</strong>jo<strong>in</strong>t<br />

setsU~,l&,..., UK such that <strong>the</strong> sum of <strong>the</strong> sizes of <strong>the</strong><br />

items <strong>in</strong> each U; <strong>is</strong> prec<strong>is</strong>ely B?<br />

3 <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-Complete<br />

The proof that <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-complete for<br />

cubic lattices <strong>is</strong> compr<strong>is</strong>ed of two ma<strong>in</strong> parts. In <strong>the</strong><br />

first part (Section 3.1), we show that PERFECT <strong>HP</strong><br />

STRING-FOLD <strong>is</strong> just a special caSe of <strong>HP</strong> STRING-<br />

FOLD. The proof <strong>is</strong> not difficult; it quickly follows from<br />

<strong>the</strong> fact that <strong>the</strong> m<strong>in</strong>imum energy configuration of n3<br />

unrestricted H’s <strong>in</strong> <strong>the</strong> cubic lattice <strong>is</strong> an n x n x TZ cube.<br />

In <strong>the</strong> second part (Section 3.2), we prove that PER-<br />

FECT <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-complete. Th<strong>is</strong> proof<br />

compr<strong>is</strong>es <strong>the</strong> ma<strong>in</strong> result of <strong>the</strong> paper and <strong>is</strong> somewhat<br />

complicated. In order to aid <strong>the</strong> reader, we have endeavored<br />

to supply <strong>in</strong>tuition where it may be helpful.<br />

When <strong>the</strong> two parts are comb<strong>in</strong>ed, we f<strong>in</strong>d that <strong>HP</strong><br />

STRING-FOLD iu <strong>the</strong> cubic lattice <strong>is</strong> <strong>NP</strong>-complete.’<br />

‘To be prec<strong>is</strong>e, we prove that <strong>the</strong> STRING-FOLD problems<br />

are <strong>NP</strong>-hard. <strong>NP</strong>-completeness follows from <strong>the</strong> additional simple<br />

fact that <strong>the</strong> problems are <strong>in</strong> <strong>NP</strong>.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!