13.07.2015 Views

Efficient Evaluation of Minimal Perfect Hash Functions

Efficient Evaluation of Minimal Perfect Hash Functions

Efficient Evaluation of Minimal Perfect Hash Functions

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Lemma 5 Assume a ≥ c f n/4r and b ≥ 2c g rn. For any S ∈ ( Un), a randomly chosen pair (f, g) ∈H a × H b is r-good for S with positive probability, namely more than (1 − 2cgrnb)(1 − c f n4ra ).Pro<strong>of</strong>. By c g -universality, the expected value for the sum ∑ ( |B(g,i)|) ∑i 2 ={u,v}∈( S 2),g(u)=g(v) 1is bounded by ( )n cg2 b< cgn22b, so applying Markov’s inequality, the sum has value less than n/4rwith probability > 1 − 2cgrnb. Since |B(g, i)| 2 ≤ 4 ( )|B(g,i)|∑2 for |B(g, i)| > 1, we then have thati,|B(g,i)|>1 |B(g, i)|2 ≤ n/r. Given a function g such that these inequalities hold, we would liketo bound the probability that x ↦→ (f(x), g(x)) is 1-1 on S. By reasoning similar to before, weget that for randomly chosen f the expected number <strong>of</strong> collisions among elements <strong>of</strong> B(g, i) atmost ( )|B(g,i)|2 cf /a. Summing over i we get the expected total number <strong>of</strong> collisions to be less thanc f n/4ra. Hence there is no collision with probability more than 1 − c f n4ra. By the law <strong>of</strong> conditionalprobabilities, the probability <strong>of</strong> fulfilling both r-goodness conditions can be found by multiplyingthe probabilities found. ✷We now show that r-goodness is sufficient for displacement values to exist, by means <strong>of</strong> aconstructive argument in the style <strong>of</strong> [13, Theorem 1]:Theorem 6 Let (f, g) ∈ H a × H b be r-good for S ∈ ( Un), r ≥ 1, and let |D| ≥ n. Then there existdisplacement values d 0 , . . . , d b−1 ∈ D, such that x ↦→ f(x) ⊞ d g(x) is 1-1 on S.Pro<strong>of</strong>. Note that displacement d i is used on the elements {f(x) | x ∈ B(g, i)}, and that anyh ∈ H(⊞, D, a, b) is 1-1 on each B(g, i). We will assign the displacement values one by one, innon-increasing order <strong>of</strong> |B(g, i)|. At the kth step, we will have displaced the k − 1 largest sets,{B(g, i) | i ∈ I}, |I| = k − 1, and want to displace the set B(g, j) such that no collision withpreviously displaced elements occurs. If |B(g, j)| ≤ 1 this is trivial since |D| ≥ n, so we assume|B(g, j)| > 1. The following claim finishes the pro<strong>of</strong>:Claim 7 If |B(g, j)| > 1, then with positive probability, namely more than 1− 1 r, a randomly chosend ∈ D displaces B(g, j) with no collision.Pro<strong>of</strong>.The displacement values which are not available are those in the set{f(x) −1 ⊞ f(y) ⊞ d i | x ∈ B(g, j), y ∈ B(g, i), i ∈ I} .It has size ≤ |B(g, j)| ∑ i∈I |B(g, i)| < ∑ i,|B(g,i)|>1 |B(g, i)|2 ≤ n/r, using first non-increasing order,|B(g, j)| > 1, and then r-goodness. Hence there must be more than (1 − 1 r)|D| good displacementvalues in D. ✷Lemma 5 implies that when a ≥ ( c f4r + ɛ)n and b ≥ (2c gr + ɛ)n, an r-good pair <strong>of</strong> hashfunctions can be found (and verified to be r-good) in expected time O(n). The pro<strong>of</strong> <strong>of</strong> Theorem6, and in particular Claim 7, shows that for a 1 + ɛ-good pair (f, g), the strategy <strong>of</strong> trying randomdisplacement values in D successfully displaces all blocks with more than one element in expected1/ɛ attempts per block. When O(n) random elements <strong>of</strong> D can be picked in O(n) time, this runsin expected time O(n/ɛ) = O(n). Finally, if displacing all blocks with only one element is easy (asis the case in the examples we look at in Sect. 2.2), the whole construction runs in expected timeO(n).For a 1-good pair (f, g), Theorem 6 just gives the existence <strong>of</strong> proper displacement values.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!