Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...
Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...
Protein Folding in the Hydrophobic-Hydrophilic (HP) Model is NP ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
-__-_<br />
. _. ,-. ,.‘.‘,.‘.‘.., iv’-,‘-, -, ;~-“-~;,..~ .,.,, -$... ;- ., ,~ ;<br />
: . . . ..-.I ..-. -;.5 ~_ :_ _<br />
-!<br />
that <strong>the</strong> H’s cm be packed <strong>in</strong>to a perfect cube (which <strong>is</strong><br />
<strong>the</strong> m<strong>in</strong>imum energy configuration for a collection of H’s<br />
without any str<strong>in</strong>g constra<strong>in</strong>ts) <strong>is</strong> <strong>NP</strong>-complete. The<br />
proof uses a transformation from a special form of <strong>the</strong><br />
BIN PACKING problem [u], which we call MODIFIED<br />
BIN PACKING. In particular, given an <strong>in</strong>stance B of<br />
<strong>the</strong> MODIFIED BIN PACKING problem, we show how<br />
to construct a sequence S with n3 H’s so that <strong>the</strong>re <strong>is</strong><br />
a conformation of S where <strong>the</strong> H’s form an n x n x n<br />
cube if and only if <strong>the</strong>re <strong>is</strong> a solution to B.<br />
Our <strong>NP</strong>-completeness proof relies heavily on <strong>the</strong> fact<br />
that nodes along <strong>the</strong> edges of a perfect cube have two<br />
or three neighbors not <strong>in</strong> <strong>the</strong> cube, whereas nodes on a<br />
face of a cube (but not on an edge) have just one neighbor<br />
not <strong>in</strong> <strong>the</strong> cube. The proof does not rely on parity<br />
arguments that are derived from <strong>the</strong> fact that <strong>the</strong> cubic<br />
lattice <strong>is</strong> bipartite. The <strong>NP</strong>-completeness proof does<br />
not immediately extend to general lattices, although our<br />
methods may be helpful <strong>in</strong> understand<strong>in</strong>g <strong>the</strong> complexity<br />
of <strong>the</strong> prote<strong>in</strong> fold<strong>in</strong>g problem <strong>in</strong> o<strong>the</strong>r models.<br />
The d<strong>is</strong>covery that prote<strong>in</strong> fold<strong>in</strong>g <strong>in</strong> <strong>the</strong> <strong>HP</strong> model<br />
<strong>is</strong> <strong>NP</strong>-complete lends fur<strong>the</strong>r importance to <strong>the</strong> study of<br />
approbation algorithms for prote<strong>in</strong> fold<strong>in</strong>g. Hart and<br />
Istrail [l2] have achieved approbation algorithms for<br />
<strong>the</strong> <strong>HP</strong> model on <strong>the</strong> cubic and square lattices. Their<br />
algorithms obta<strong>in</strong> an approximation factor of 3/S for <strong>the</strong><br />
cubic lattice and l/4 for <strong>the</strong> square lattice. Recently,<br />
Hart and Istrail 11.51 have presented approximation algorithms<br />
for off-lattice and side cha<strong>in</strong> variants of <strong>the</strong><br />
<strong>HP</strong> model. Agarwala et al. [1] have shown that performance<br />
guarantees of roughly 60% can be achieved for<br />
<strong>the</strong> <strong>HP</strong> model on <strong>the</strong> hexagonal close packed lattice.<br />
Th<strong>is</strong> model has <strong>the</strong> advantage that <strong>the</strong>re are no parity<br />
constraiuts imposed by <strong>the</strong> lattice.<br />
The journal version of th<strong>is</strong> paper appears <strong>in</strong> [3].<br />
2 Problem Statement<br />
The prote<strong>in</strong> fold<strong>in</strong>g problem cons<strong>is</strong>ts of embedd<strong>in</strong>g a<br />
given f<strong>in</strong>ite polypeptide sequence S of length N <strong>in</strong>to a<br />
given fixed <strong>in</strong>f<strong>in</strong>ite graph G. In th<strong>is</strong> paper, <strong>the</strong> graph G<br />
will primarily be <strong>the</strong> 34imetional cubic lattice 23. A<br />
fold of S <strong>in</strong> G <strong>is</strong> an <strong>in</strong>jective mapp<strong>in</strong>g from [l, . . _, Xl<br />
t.o G such that adjacent <strong>in</strong>tegers map to adjacent nodes<br />
of G. Each node of 23 has six neighbors. The energy<br />
of a fold of S <strong>in</strong> G to be m<strong>in</strong>imized <strong>is</strong> <strong>the</strong> negation of<br />
<strong>the</strong> number of H-H bonds <strong>in</strong> <strong>the</strong> fold, where a bond <strong>is</strong><br />
a pair of symbols mapped to adjacent nodes.<br />
In <strong>the</strong> traditional way of look<strong>in</strong>g at th<strong>is</strong> problem,<br />
successive pairs of H’s <strong>in</strong> S are not counted as form<strong>in</strong>g<br />
a bond when comput<strong>in</strong>g energy; thus, a given H has<br />
at most four neighbor<strong>in</strong>g H’s. For simplicity, we will<br />
<strong>in</strong>clude all H-H bonds <strong>in</strong> <strong>the</strong> energy calculation <strong>in</strong> th<strong>is</strong><br />
paper- Our choice of convention does not make any<br />
dXerence <strong>in</strong> terms of <strong>the</strong> <strong>NP</strong>-completeness result s<strong>in</strong>ce<br />
<strong>the</strong> number of H-H bonds <strong>in</strong> <strong>the</strong> sequence does not<br />
depend <strong>in</strong> any way on <strong>the</strong> fold.<br />
Ano<strong>the</strong>r equivalent way of look<strong>in</strong>g at <strong>the</strong> traditional<br />
problem <strong>is</strong> to m<strong>in</strong>imize <strong>the</strong> number of non-H’s among<br />
<strong>the</strong> four neighbors of each H. In th<strong>is</strong> framework, our<br />
ma<strong>in</strong> result <strong>is</strong> that <strong>the</strong> problem of determ<strong>in</strong><strong>in</strong>g whe<strong>the</strong>r<br />
or not <strong>the</strong>re <strong>is</strong> a fold <strong>in</strong> <strong>the</strong> <strong>HP</strong>-model on <strong>the</strong> 3-d lattice<br />
with a total of S non-H neighbors (<strong>the</strong> absolute m<strong>in</strong>imum)<br />
<strong>is</strong> <strong>NP</strong>-hard. We achieve hardness of appro-xima-<br />
tion results only for th<strong>is</strong> latter problem. The hardness<br />
of (9/S - c)-approximation <strong>is</strong> straightforward. (We believe<br />
it <strong>is</strong> possible to extend our results to hardness of<br />
N’ times optimal also, but we do not address th<strong>is</strong> <strong>is</strong>sue<br />
<strong>in</strong> <strong>the</strong> current paper.)<br />
We def<strong>in</strong>e <strong>the</strong> follow<strong>in</strong>g dec<strong>is</strong>ion version of <strong>the</strong> prote<strong>in</strong><br />
fold<strong>in</strong>g problem.<br />
<strong>HP</strong> STRING-FOLD<br />
Instance: A f<strong>in</strong>ite sequence S over <strong>the</strong> alphabet {H, P},<br />
an <strong>in</strong>teger m, and a graph G.<br />
Question: Is <strong>the</strong>re a fold (i.e., a self-avoid<strong>in</strong>g walk)<br />
of S <strong>in</strong> G where <strong>the</strong> number of H-H bonds <strong>is</strong> at least<br />
m?<br />
Our ma<strong>in</strong> result <strong>is</strong> that <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>complete<br />
when G <strong>is</strong> 23. The proof follows by show<strong>in</strong>g<br />
that <strong>the</strong> follow<strong>in</strong>g fold<strong>in</strong>g problem <strong>is</strong> <strong>NP</strong>-hard.<br />
PERFECT <strong>HP</strong> STRING-FOLD<br />
Instance: An <strong>in</strong>teger n and a f<strong>in</strong>ite sequence S over<br />
<strong>the</strong> alphabet {H, P} which conta<strong>in</strong>s n3 H’s.<br />
Question: Is <strong>the</strong>re a fold of S <strong>in</strong> 23 for which <strong>the</strong><br />
H’s are perfectly packed <strong>in</strong>to an n x n x n cube?<br />
The proof that PERFECT <strong>HP</strong> STRING-FOLD <strong>is</strong><br />
<strong>NP</strong>-hard <strong>in</strong>volves a transformation from <strong>the</strong> (strongly)<br />
<strong>NP</strong>-complete problem of BIN PACKING, which <strong>is</strong> def<strong>in</strong>ed<br />
as follows by Garey and Johnson [ll].<br />
BIN PACKING<br />
Instance: A f<strong>in</strong>ite set U of items, a size s(u) E 2+<br />
for each u E U, a positive <strong>in</strong>teger bm capacity B, ad a<br />
positive <strong>in</strong>teger K.<br />
Question: Is <strong>the</strong>re a partition of U <strong>in</strong>to d<strong>is</strong>jo<strong>in</strong>t<br />
sets Ul,Uz,..., UK such that <strong>the</strong> sum of <strong>the</strong> sizes of <strong>the</strong><br />
items <strong>in</strong> each U; <strong>is</strong> B or less?<br />
To simplify matters, we will use <strong>the</strong> follow<strong>in</strong>g va.ri-<br />
ation of BIN PACKING which <strong>is</strong> easily shown to be<br />
strongly <strong>NP</strong>-complete.<br />
MODIFIED BIN PACKING<br />
Instance: A f<strong>in</strong>ite set U of items, a size s(u) that<br />
<strong>is</strong> a positive even <strong>in</strong>teger for each u E U, a positive<br />
<strong>in</strong>teger b<strong>in</strong> capacity B, and a positive <strong>in</strong>teger K, where<br />
cdw;<br />
BK-<br />
: Is <strong>the</strong>re a partition of U <strong>in</strong>to d<strong>is</strong>jo<strong>in</strong>t<br />
setsU~,l&,..., UK such that <strong>the</strong> sum of <strong>the</strong> sizes of <strong>the</strong><br />
items <strong>in</strong> each U; <strong>is</strong> prec<strong>is</strong>ely B?<br />
3 <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-Complete<br />
The proof that <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-complete for<br />
cubic lattices <strong>is</strong> compr<strong>is</strong>ed of two ma<strong>in</strong> parts. In <strong>the</strong><br />
first part (Section 3.1), we show that PERFECT <strong>HP</strong><br />
STRING-FOLD <strong>is</strong> just a special caSe of <strong>HP</strong> STRING-<br />
FOLD. The proof <strong>is</strong> not difficult; it quickly follows from<br />
<strong>the</strong> fact that <strong>the</strong> m<strong>in</strong>imum energy configuration of n3<br />
unrestricted H’s <strong>in</strong> <strong>the</strong> cubic lattice <strong>is</strong> an n x n x TZ cube.<br />
In <strong>the</strong> second part (Section 3.2), we prove that PER-<br />
FECT <strong>HP</strong> STRING-FOLD <strong>is</strong> <strong>NP</strong>-complete. Th<strong>is</strong> proof<br />
compr<strong>is</strong>es <strong>the</strong> ma<strong>in</strong> result of <strong>the</strong> paper and <strong>is</strong> somewhat<br />
complicated. In order to aid <strong>the</strong> reader, we have endeavored<br />
to supply <strong>in</strong>tuition where it may be helpful.<br />
When <strong>the</strong> two parts are comb<strong>in</strong>ed, we f<strong>in</strong>d that <strong>HP</strong><br />
STRING-FOLD iu <strong>the</strong> cubic lattice <strong>is</strong> <strong>NP</strong>-complete.’<br />
‘To be prec<strong>is</strong>e, we prove that <strong>the</strong> STRING-FOLD problems<br />
are <strong>NP</strong>-hard. <strong>NP</strong>-completeness follows from <strong>the</strong> additional simple<br />
fact that <strong>the</strong> problems are <strong>in</strong> <strong>NP</strong>.<br />
’