Independence and chromatic number (and random k-SAT): Sparse ...

**Independence** **and** **chromatic** **number**

(**and** r**and**om k-**SAT**):

**Sparse** Case

Dimitris Achlioptas

Microsoft

R**and**om graphs

W.h.p.: with probability that tends to 1 as n →∞.

Hamiltonian cycle

� Let τ2 be the moment all vertices have degree ≥ 2

� Let be the moment all vertices have degree ≥ 2

τ2

Hamiltonian cycle

G(n, m = τ2)

� W.h.p. has a Hamiltonian cycle

Hamiltonian cycle

� Let be the moment all vertices have degree ≥ 2

τ2

G(n, m = τ2)

� W.h.p. has a Hamiltonian cycle

� W.h.p. it can be found in time

O(n 3 log n)

[Ajtai, Komlós, Szemerédi 85] [Bollobás, Fenner, Frieze 87]

� Let be the moment all vertices have degree ≥ 2

τ2

G(n, m = τ2)

� W.h.p. has a Hamiltonian cycle

� W.h.p. it can be found in time

O(n 3 log n)

[Ajtai, Komlós, Szemerédi 85] [Bollobás, Fenner, Frieze 87]

� In G(n, 1/2) Hamiltonicity can be decided in O(n)

expected time.

Hamiltonian cycle

[Gurevich, Shelah 84]

Cliques in r**and**om graphs

G(n, 1/2)

� The largest clique in has size

2log 2 n − 2log 2 log 2 n±1

[Bollobás, Erdős 75] [Matula 76]

Cliques in r**and**om graphs

G(n, 1/2)

� The largest clique in has size

2log 2 n − 2log 2 log 2 n±1

� No maximal clique of size <

[Bollobás, Erdős 75] [Matula 76]

log 2 n

Cliques in r**and**om graphs

G(n, 1/2)

� The largest clique in has size

2log 2 n − 2log 2 log 2 n±1

� No maximal clique of size <

[Bollobás, Erdős 75] [Matula 76]

log 2 n

(1 + ²)log 2 n

� Can we find a clique of size ?

[Karp 76]

Cliques in r**and**om graphs

G(n, 1/2)

� The largest clique in has size

2log 2 n − 2log 2 log 2 n±1

� No maximal clique of size <

[Bollobás, Erdős 75] [Matula 76]

log 2 n

(1 + ²)log 2 n

� Can we find a clique of size ?

1=2¡ ²

What if we “hide” a clique of size n ?

Two problems for which we know much

less.

� Chromatic **number** of sparse r**and**om graphs

� R**and**om k-**SAT**

Two problems for which we know much

less.

� Chromatic **number** of sparse r**and**om graphs

� R**and**om k-**SAT**

� Canonical for r**and**om constraint satisfaction:

– Binary constraints over k-ary domain

– k-ary constraints over binary domain

� Studied in: AI, Math, Optimization, Physics,…

A factor-graph representation

of k-coloring

� Each vertex is a variable with

domain {1,2,…,k}.

� Each edge is a constraint on two

variables.

� All constraints are “not-equal”.

� R**and**om graph = each constraint

picks two variables at r**and**om.

Vertices

v 1

v 2

. . .

. . .

Edges

e 1

e 2

**SAT** via factor-graphs

(x12 ∨ x5 ∨ x9) ∧ (x34 ∨ x21 ∨ x5) ∧ ······∧ (x21 ∨ x9 ∨ x13)

**SAT** via factor-graphs

(x12 ∨ x5 ∨ x9) ∧ (x34 ∨ x21 ∨ x5) ∧ ······∧ (x21 ∨ x9 ∨ x13)

� Edge between x **and** c iff x

occurs in clause c.

� Edges are labeled +/- to indicate

whether the literal is negated.

� Constraints are “at least one

literal must be satisfied”.

� R**and**om k-**SAT** = constraints

pick k literals at r**and**om.

Variable nodes

x

. . .

. . .

c

Clause nodes

Diluted mean-field spin glasses

� Small, discrete domains: spins

� Conflicting, fixed constraints:

quenched disorder

� R**and**om bipartite graph:

lack of geometry, mean field

� **Sparse**: diluted

� Hypergraph coloring, r**and**om

XOR-**SAT**, error-correcting

codes…

Variables

x

. . .

. . .

c

Constraints

R**and**om graph coloring:

Background

A trivial lower bound

� For any graph, the **chromatic** **number**

is at least:

Number of vertices

Size of maximum independent set

A trivial lower bound

� For any graph, the **chromatic** **number**

is at least:

Number of vertices

Size of maximum independent set

� For r**and**om graphs, use upper bound for

largest independent set. µ

n

s

× (1 − p) (s2)

→ 0

� Repeat

An algorithmic upper bound

�Pick a r**and**om uncolored vertex

�Assign it the lowest allowed **number** (color)

Uses 2 x trivial lower bound **number** of colors

� Repeat

An algorithmic upper bound

�Pick a r**and**om uncolored vertex

�Assign it the lowest allowed **number** (color)

Uses 2 x trivial lower bound **number** of colors

� No algorithm is known to do better

The lower bound is asymptotically

tight

G(n, d/n)

As d grows, can be colored using

independent sets of essentially maximum size

[Bollobás 89]

[Łuczak 91]

The lower bound is asymptotically

tight

G(n, d/n)

As d grows, can be colored using

independent sets of essentially maximum size

[Bollobás 89]

[Łuczak 91]

Only two possible values

Theorem. For every d>0, there exists an integer

k = k(d) such that w.h.p. the **chromatic** **number** of

G(n, p = d/n)

is either k or k +1

[Łuczak 91]

“The Values”

Theorem. For every d>0, there exists an integer

k = k(d) such that w.h.p. the **chromatic** **number** of

G(n, p = d/n)

is either k or k +1

where k is the smallest integer s.t. d

Examples

� If d =7, w.h.p. the **chromatic** **number** is 4 or 5

.

d =7

Examples

� If , w.h.p. the **chromatic** **number** is or .

d =10 60

� If , w.h.p. the **chromatic** **number** is

377145549067226075809014239493833600551612641764765068157

or

4 5

3771455490672260758090142394938336005516126417647650681576

5

One value

Theorem. If (2k − 1) ln k

One value

Theorem. If (2k − 1) ln k

R**and**om k-**SAT**:

Background

� Fix a set of n variables

2 k

R**and**om k-**SAT**

µ

n

k

� Among all possible k-clauses select m

uniformly **and** independently. Typically m = rn

.

k =3

� Example ( ) :

X = {x1,x2,...,xn }

(x12 ∨ x5 ∨ x9) ∧ (x34 ∨ x21 ∨ x5) ∧ ······∧ (x21 ∨ x9 ∨ x13)

Generating hard 3-**SAT** instances

[Mitchell,

Selman,

Levesque 92]

Generating hard 3-**SAT** instances

[Mitchell,

Selman,

Levesque 92]

� The critical point appears to be around r ≈ 4.2

The satisfiability threshold conjecture

� For every , there is a constant such that

lim

n!1

k ≥ 3 rk

Pr[Fk (n, rn) issatisfiable] =

½ 1 if r = rk − ²

0 if r = rk + ²

The satisfiability threshold conjecture

� For every , there is a constant such that

lim

n!1

Pr[Fk (n, rn) issatisfiable] =

� For every k ≥ 3,

k ≥ 3 rk

2 k

½ 1 if r = rk − ²

0 if r = rk + ²

k < rk < 2 k ln 2

Repeat

Unit-clause propagation

– Pick a r**and**om unset variable **and** set it to 1

– While there are unit-clauses satisfy them

– If a 0-clause is generated fail

Repeat

Unit-clause propagation

– Pick a r**and**om unset variable **and** set it to 1

– While there are unit-clauses satisfy them

– If a 0-clause is generated fail

� UC finds a satisfying truth assignment if

r< 2k

k

[Chao, Franco 86]

An asymptotic gap

� The probability of satisfiability it at most

2 n

µ

1 − 1

2 k

m

=

·

µ

2

1 − 1

2 k

r ¸ n

→ 0 for r ≥ 2 k ln 2

An asymptotic gap

Since mid-80s, no asymptotic progress over

2 k

k < rk < 2 k ln 2

Getting to within a factor of 2

Theorem: For all k ≥ 3**and**

r

The trivial upper bound is the truth!

Theorem: For all k ≥ 3, a r**and**om k-CNF formula

with m = rn clauses is w.h.p. satisfiable if

r · 2 k ln 2 − k

2

− 1

Some explicit bounds for

the k-**SAT** threshold

k 3 4 5 7 20 21

Upper bound 4.51 10.23 21.33 87.88 726, 817 1, 453, 635

Our lower bound 2.68 7.91 18.79 84.82 726, 809 1, 453, 626

Algorithmic lower bound 3.52 5.54 9.63 33.23 95, 263 181, 453

The second moment method

For any non-negative r.v. X,

E[X ]2

Pr[X > 0] ≥

E[X 2 ]

Pro of: Let Y =1ifX>0, **and** Y = 0 otherwise.

By Cauchy-Schwartz,

E[X] 2 = E[XY ] 2 · E[X 2 ]E[Y 2 ] = E[X 2 ]Pr[X>0] .

Ideal for sums

If X = X1+X2+ ··· then

E[X] 2 = X

i;j

E[X 2 ] = X

i;j

E[Xi ]E[Xj ]

E[Xi Xj ]

X X

Ideal for sums

If X = X1+X2+ ··· then

Example:

E[X] 2 = X

i;j

E[X 2 ] = X

i;j

E[Xi ]E[Xj ]

E[Xi Xj ]

The Xi correspond to the ¡ ¢ n

potential q-cliques in G(n, 1/2)

X X

Dominant contribution from non-ovelapping cliques

q

General observations

� Method works well when the X i are like

“needles in a haystack”

General observations

� Method works well when the X i are like

“needles in a haystack”

� Lack of correlations =⇒

rapid drop in

influence around solutions

General observations

� Method works well when the X i are like

“needles in a haystack”

� Lack of correlations =⇒

rapid drop in

influence around solutions

� Algorithms get no “hints”

The second moment method for r**and**om

k-**SAT**

� Let X be the # of satisfying truth assignments

The second moment method for r**and**om

k-**SAT**

� Let X be the # of satisfying truth assignments

For every clause-density r>0, there is β = β(r) > 0 such that

E[X] 2

E[X 2 ]

< (1 − β)n

The second moment method for r**and**om

k-**SAT**

� Let X be the # of satisfying truth assignments

For every clause-density r>0, there is β = β(r) > 0 such that

E[X] 2

E[X 2 ]

< (1 − β)n

� The **number** of satisfying truth assignments has

huge variance.

The second moment method for r**and**om

k-**SAT**

� Let X be the # of satisfying truth assignments

For every clause-density r>0, there is β = β(r) > 0 such that

E[X] 2

E[X 2 ]

< (1 − β)n

� The **number** of satisfying truth assignments has

huge variance.

σ ∈ {0, 1} n

� The satisfying truth assignments do not form a

“uniformly r**and**om mist” in

To prove 2 k ln2 – k/2 - 1

H(σ,F)

� Let be the **number** of

satisfied literal occurrences in F under σ

where satisfies (1 + γ .

2 ) k¡ 1 (1 − γ2 γ < 1

)=1

To prove 2 k ln2 – k/2 - 1

H(σ,F)

� Let be the **number** of

satisfied literal occurrences in F under σ

X = X(F )

� Let be defined as

X(F ) = X

= X

¾

¾

H (¾;F )

1¾j= F γ

Y

c

H (¾;c )

1¾j= c γ

where satisfies (1 + γ .

2 ) k¡ 1 (1 − γ2 γ < 1

)=1

To prove 2 k ln2 – k/2 - 1

H(σ,F)

� Let be the **number** of

satisfied literal occurrences in F under σ

X = X(F )

� Let be defined as

X(F ) = X

= X

¾

¾

H (¾;F )

1¾j= F γ

Y

c

H (¾;c )

1¾j= c γ

where satisfies (1 + γ .

2 ) k¡ 1 (1 − γ2 γ < 1

)=1

General functions

� Given any t.a. σ **and** any k-clause c let

v = v(σ,c) ∈ {−1, +1} k

be the values of the literals in c under σ.

General functions

� Given any t.a. σ **and** any k-clause c let

v = v(σ,c) ∈ {−1, +1} k

be the values of the literals in c under σ.

� We will study r**and**om variables of the form

X = X Y

¾

c

f(v(σ,c))

where f : {−1, +1} is an arbitrary function

k → R

f (v) = 1 for all v

f (v) =

f (v) =

X = X

¾

Y

( 0 if v =(−1, −1,...,−1)

8

><

>:

1 otherwis e

0 if v =(−1, −1,...,−1) or

if v = (+1, +1,...,+1)

1 otherwis e

c

f(v(σ,c))

=⇒

=⇒

=⇒

2 n

# of satisfying

truth assignments

# of “Not All Equal”

truth assignments

(NAE)

f (v) = 1 for all v

f (v) =

f (v) =

X = X

¾

Y

( 0 if v =(−1, −1,...,−1)

8

><

>:

1 otherwis e

0 if v =(−1, −1,...,−1) or

if v = (+1, +1,...,+1)

1 otherwis e

c

f(v(σ,c))

=⇒

=⇒

=⇒

2 n

# of satisfying

truth assignments

# of satisfying

truth assignments

whose complement

is also satisfying

Overlap parameter = distance

� Overlap parameter is Hamming distance

Overlap parameter = distance

� Overlap parameter is Hamming distance

� For any f, if σ, τagree on z = n/2 variables

h

i

E f(v(σ,c))f(v(τ,c))

= E £ f(v(σ,c)) ¤ E £ f(v(τ,c)) ¤

Overlap parameter = distance

� Overlap parameter is Hamming distance

� For any f, if σ, τagree on z = n/2 variables

h

i

E f(v(σ,c))f(v(τ,c))

σ, τ z

= E £ f(v(σ,c)) ¤ E £ f(v(τ,c)) ¤

� For any f , if agree on variables, let

Cf (z/n) ≡ E

h

i

f(v(σ,c))f(v(τ,c))

Contribution according to distance

**Independence**

Entropy vs. correlation

For every function f :

Recall:

E[X 2 ] = 2 n

E[X] 2 = 2 n

µ

n

αn

=

µ

Xn

z=0

Xn

z=0

1

α ® n

(1 − α) 1¡ ®

µ

n

z

µ

n

z

× poly(n)

Cf (z/n) m

Cf (1/2) m

Contribution according to distance

**Independence**

®=z/n

The importance of being balanced

� An analytic condition:

C 0 f (1=2) = 0 =⇒ the s.m.m. fails

®=z/n

NAE 5-**SAT**

7

NAE 5-**SAT**

7

The importance of being balanced

� An analytic condition:

C 0 f (1=2) = 0 =⇒ the s.m.m. fails

The importance of being balanced

� An analytic condition:

C 0 f (1=2) = 0 =⇒ the s.m.m. fails

� A geometric criterion:

C 0 f (1=2) = 0 ⇐⇒ X

v∈{−1;+1} k

f (v)v =0

Constant

**SAT**

The importance of being balanced

C 0 f (1=2) = 0 ⇐⇒ X

Complementary

(-1,-1,…,-1)

v∈{−1;+1} k

f (v)v =0

(+1,+1,…,+1)

Balance & Information Theory

� Want to balance vectors in “optimal” way

Balance & Information Theory

� Want to balance vectors in “optimal” way

� Information theory

=⇒

f(v)

maximize the entropy of the subject to

X

f(v)v =0

f(−1, −1,...,−1) = 0 **and**

v 2f¡ 1;+1 g k

Balance & Information Theory

� Want to balance vectors in “optimal” way

� Information theory

=⇒

f(v)

maximize the entropy of the subject to

X

f(v)v =0

f(−1, −1,...,−1) = 0 **and**

v 2f¡ 1;+1 g k

� Lagrange multipliers the optimal f is

f(v) = γ #of+1sinv

γ

=⇒

for the unique that satisfies the constraints

Balance & Information Theory

� Want to balance vectors in “optimal” way

� Information theory

f(v)

maximize the entropy of the subject to

X

f(v)v =0

f(−1, −1,...,−1) = 0 **and**

Heroic

(-1,-1,…,-1)

=⇒

v 2f¡ 1;+1 g k

(+1,+1,…,+1)

R**and**om graph coloring

Threshold formulation

Theorem. A r**and**om graph with n vertices

**and** m = cn edgesis w.h.p. k-colorable if

c · k log k − log k − 1 .

**and** w h p non-k-colorable if

c ≥ k log k − 1

2 log k .

� Non-k-colorability:

� k-colorability:

Main points

Pro of. The¬probability that¬there exists¬any

k-coloring¬is at¬most

k n

µ

1 − 1

cn

k

→ 0

Pro of. Apply second moment method to the

**number** of balanced k-colorings of G(n, m).

Setup

� Let X σ be the indicator that the balanced

k-partition σ is a proper k-coloring.

X = P

� We will prove that if ¾ then for all

there is a constant

such that

X¾

c · k log k − log k − 1

D = D(k) > 0

E[X 2 ]

E[X 2 ]=

Setup

� sum over all of .

� For any pair of balanced k-partitions

let a ij n be the # of vertices having

color i in σ **and** color j in τ.

0

Pr[σ **and** τ are proper] =

σ, τ E[X¾X¿]

@1 − 2

k + X

ij

σ, τ

a 2 ij

1

A

cn

σ, τ

When are

uncorrelated, A is

the flat 1/k

matrix

1

k

Examples

Balance =⇒ A is doubly-stochastic.

σ, τ

When are

uncorrelated, A is

the flat 1/k matrix

1

k

Examples

Balance =⇒ A is doubly-stochastic.

σ, τ

As align,

tends to the

identity matrix

A

I

So,

A matrix-valued overlap

E[X 2 ]= X

k

A2B k

µ µ

n

An

1 − 2 1

+

k k2 X a 2 ij

cn

So,

A matrix-valued overlap

E[X 2 ]= X

k

A2B k

µ µ

n

An

which is controlled by the maximizer of

− X aij log aij + c log

1 − 2 1

+

k k2 X a 2 ij

cn

over k × k doubly-stochastic matrices A =(aij ) .

µ

1 − 2 1

+

k k2 X a 2 ij

So,

A matrix-valued overlap

E[X 2 ]= X

− X µ

aij log aij + c log

k

A2B k

µ µ

n

An

1 − 2 1

+

k k2 which is controlled by the maximizer of

1 − 2 1

+

k k2 X a 2 ij

X a 2 ij

cn

over k × k doubly-stochastic matrices A =(aij ) .

− X

� Entropy decreases away from the flat matrix

c

i;j

– For small , this loss overwhelms the sum of squares gain

– But for large enough ,….

aij log aij + c · 1 X

2

k

� The maximizer jumps instantaneously from flat, to a

matrix where vertices capture the majority of mass

k

� Proved this happens only for

c

i;j

1/k

a 2 ij

c >klog k − log k − 1

− X

� Entropy decreases away from the flat matrix

c

i;j

– For small , this loss overwhelms the sum of squares gain

– But for large enough ,….

aij log aij + c · 1 X

2

k

� The maximizer jumps instantaneously from flat, to a

matrix where vertices capture the majority of mass

k

� Proved this happens only for

c

i;j

1/k

a 2 ij

c >klog k − log k − 1

− X

� Entropy decreases away from the flat matrix

c

i;j

– For small , this loss overwhelms the sum of squares gain

– But for large enough ,….

aij log aij + c · 1 X

2

k

� The maximizer jumps instantaneously from flat, to a

matrix where vertices capture the majority of mass

k

� Proved this happens only for

c

i;j

1/k

a 2 ij

c >klog k − log k − 1

− X

i;j

� Entropy decreases away from the flat matrix

c

– For small , this loss overwhelms the sum of squares gain

– But for large enough ,….

aij log aij + c · 1 X

2

k

� The maximizer jumps instantaneously from flat, to a

matrix where entries capture majority of mass

k

c

i;j

1/k

a 2 ij

− X

i;j

� Entropy decreases away from the flat matrix

c

– For small , this loss overwhelms the sum of squares gain

– But for large enough ,….

aij log aij + c · 1 X

2

k

� The maximizer jumps instantaneously from flat, to a

matrix where entries capture majority of mass

k

� This jump happens only after

c

i;j

1/k

a 2 ij

c >klog k − log k − 1

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

2. Prescribe the L 2 norm of each row ρ i .

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

2. Prescribe the L 2 norm of each row ρ i .

3. Find max-entropy, f(ρ i ), of each row given ρ i .

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

2. Prescribe the L 2 norm of each row ρ i .

3. Find max-entropy, f(ρ i ), of each row given ρ i .

4. Prove that f ’’’ > 0.

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

2. Prescribe the L 2 norm of each row ρ i .

3. Find max-entropy, f(ρ i ), of each row given ρ i .

4. Prove that f ’’’ > 0.

5. Use (4) to determine the optimal distribution of

the ρ i given their total ρ.

Proof overview

Proof. Compare the value at the flat matrix with

upper bound for everywhere else derived by:

1. Relax to singly stochastic matrices.

2. Prescribe the L 2 norm of each row ρ i .

3. Find max-entropy, f(ρ i ), of each row given ρ i .

4. Prove that f ’’’ > 0.

5. Use (4) to determine the optimal distribution of

the ρ i given their total ρ.

6. Optimize over ρ.

R**and**om regular graphs

Theorem. For every integer d>0, w.h.p. the

**chromatic** **number** of a r**and**om d-regular graph

is either k, k +1,ork +2

where k is the smallest integer s.t. d

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

k

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

For k =3 the maximizer is

(x, y, y) wherex>y

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

For k =3 the maximizer is

(x, y, y) wherex>y

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

For k =3 the maximizer is

(x, y, y) wherex>y

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

For k =3 the maximizer is

(x, y, y) wherex>y

Maximize

X

−

k

i =1

ai log ai

subject to

Xk

ai = 1

i =1

Xk

i =1

a 2 i = ρ

for some 1/k < ρ < 1

A vector analogue

(optimizing a single row)

For k =3 the maximizer is

(x, y, y) wherex>y

For k>3 the maximizer is

(x, y, . . . , y)

Maximum entropy image

restoration

� Create a composite image of an object that:

– Minimizes “empirical error”

� Typically, least-squares error over luminance

– Maximizes “plausibility”

� Typically, maximum entropy

Maximum entropy image

restoration

Structure of maximizer helps detect stars

in astronomy

The End