08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

than log n vertices. The number <strong>of</strong> components containing a cycle is a constant independent<br />

<strong>of</strong> n. Thus, the graph consists <strong>of</strong> a forest <strong>of</strong> trees plus a few components that have<br />

a single cycle with no Ω(log n) size components.<br />

At p equal to 1/n, a phase transition occurs in which a giant component emerges. The<br />

transition consists <strong>of</strong> a double jump. At p = 1/n, components <strong>of</strong> n 2/3 vertices emerge,<br />

which are almost surely trees. Then at p = d/n, d > 1, a true giant component emerges<br />

that has a number <strong>of</strong> vertices proportional to n. This is a seminal result in random graph<br />

theory and the main subject <strong>of</strong> this section. Giant components also arise in many real<br />

world graphs; the reader may want to look at large real-world graphs, like portions <strong>of</strong> the<br />

web and find the size <strong>of</strong> the largest connected component.<br />

When one looks at the connected components <strong>of</strong> large graphs that appear in various<br />

contexts, one observes that <strong>of</strong>ten there is one very large component. One example is<br />

a graph formed from a data base <strong>of</strong> protein interactions 13 where vertices correspond to<br />

proteins and edges correspond to pairs <strong>of</strong> proteins that interact. By an interaction, one<br />

means two amino acid chains that bind to each other for a function. The graph has 2735<br />

vertices and 3602 edges. At the time we looked at the data base, the associated graph<br />

had the number <strong>of</strong> components <strong>of</strong> various sizes shown in Table ??. There are a number<br />

<strong>of</strong> small components, but only one component <strong>of</strong> size greater than 16, and that is a giant<br />

component <strong>of</strong> size 1851. As more proteins are added to the data base the giant component<br />

will grow even larger and eventually swallow up all the smaller components.<br />

Size <strong>of</strong><br />

component<br />

1 2 3 4 5 6 7 8 9 10 11 12 · · · 15 16 · · · 1851<br />

Number <strong>of</strong><br />

components<br />

48 179 50 25 14 6 4 6 1 1 1 0 0 0 1 0 1<br />

Table 2: Size <strong>of</strong> components in the graph implicit in the database <strong>of</strong> interacting proteins.<br />

The existence <strong>of</strong> a giant component is not unique to the graph produced from the<br />

protein data set. Take any data set that one can convert to a graph and it is likely that<br />

the graph will have a giant component, provided that the ratio <strong>of</strong> edges to vertices is a<br />

small number greater than one. Table 12.13 gives two other examples. This phenomenon,<br />

<strong>of</strong> the existence <strong>of</strong> a giant component in many real world graphs, deserves study.<br />

Returning to G(n, p), as p increases beyond d/n, all non isolated vertices are absorbed<br />

into the giant component, and at p = 1 ln n, the graph consists only <strong>of</strong> isolated vertices<br />

2 n<br />

plus a giant component. At p = ln n , the graph becomes completely connected. By<br />

n<br />

p = 1/2, the graph is not only connected, but is sufficiently dense that it has a clique <strong>of</strong><br />

13 <strong>Science</strong> 1999 July 30 Vol. 285 No. 5428 pp751-753.<br />

88

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!