31.12.2014 Views

Decentralized Search in Scale-Free P2P Networks

Decentralized Search in Scale-Free P2P Networks

Decentralized Search in Scale-Free P2P Networks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2010 16th International Conference on Parallel and Distributed Systems<br />

<strong>Decentralized</strong> <strong>Search</strong> <strong>in</strong> <strong>Scale</strong>-<strong>Free</strong> <strong>P2P</strong> <strong>Networks</strong><br />

Praphul Chandra 1 and Dushyant Arora 2<br />

HP Labs, Bangalore, India<br />

praphul.chandra@hp.com, dushyantarora13@gmail.com<br />

Abstract—<strong>Search</strong> <strong>in</strong> peer-to-peer networks is a challeng<strong>in</strong>g<br />

problem due to the absence of any centralized control & the<br />

limited <strong>in</strong>formation available at each node. When <strong>in</strong>formation is<br />

available about the overall structure of the network, use of this<br />

<strong>in</strong>formation can significantly improve the efficiency of<br />

decentralized search algorithms. Many peer-to-peer networks<br />

have been shown to exhibit power-law degree distribution. We<br />

propose two new decentralized search algorithms that can be<br />

used for efficient search <strong>in</strong> networks exhibit<strong>in</strong>g scale-free design.<br />

Unlike previous work, our algorithms perform efficient search<br />

for a large range of power-law coefficients. Our algorithms are<br />

also unique <strong>in</strong> that they complete decentralized searches<br />

efficiently even when the network has disconnected components.<br />

As a corollary of this, our algorithms are also more resilient to<br />

network failure.<br />

Keywords-Peer-to-peer networks; decentralized search; powerlaw<br />

distribution; scale-free networks<br />

II.<br />

I. INTRODUCTION<br />

Many peer-to-peer (<strong>P2P</strong>) networks have been shown to<br />

exhibit a power-law degree distribution. This is often true for<br />

large scale peer-to-peer networks which emerge for content<br />

shar<strong>in</strong>g e.g. Gnutella & <strong>Free</strong>net [5,6]. A graph G, is said to<br />

exhibit a power-law degree distribution if the number of nodes<br />

with k l<strong>in</strong>ks is proportional to k - law<br />

exponent of the graph and has a value greater than 1.<br />

When the value of the power-law exponent lies between 2 & 3,<br />

the second moment i.e. the variance of k, tends to <strong>in</strong>f<strong>in</strong>ity<br />

and such networks are referred to as scale-free networks. In a<br />

network exhibit<strong>in</strong>g power-law degree distribution, most nodes<br />

possess a small degree and a few handful of nodes act as hubs<br />

with very high degree [1].<br />

In large scale <strong>P2P</strong> networks, the power-law degree<br />

distribution is an emergent feature which offers strong network<br />

resilience [3]. S<strong>in</strong>ce the degree distribution is heterogeneous, a<br />

random failure will very likely strike one of the nodes with low<br />

degree. The majority of the nodes <strong>in</strong> the network have a small<br />

degree - these nodes are most often not crucial for the<br />

connectivity of the network. Hence, <strong>in</strong> such networks, the giant<br />

connected component (GCC), which is likely to exist, persists<br />

for high values of node failure rates <strong>in</strong> the network. This<br />

resilience to random failure is what makes the power-law<br />

degree distribution attractive to <strong>P2P</strong> networks which are often<br />

characterized by frequent churn of nodes jo<strong>in</strong><strong>in</strong>g & leav<strong>in</strong>g the<br />

system [4].<br />

1<br />

p(<br />

x,<br />

r)<br />

r(1<br />

x)<br />

( <br />

Various models have been proposed to expla<strong>in</strong> the <br />

emergence of power-law degree distribution <strong>in</strong> networks<br />

2 / 1<br />

r<br />

without a central controller/coord<strong>in</strong>ator. Among the most (1 n )<br />

prom<strong>in</strong>ent is the preferential attachment model of Barabási and<br />

Albert [2] which builds on the notion that new nodes tend to<br />

form l<strong>in</strong>ks with exist<strong>in</strong>g nodes <strong>in</strong> proportion to the degree of<br />

1 Contact Author<br />

2 While <strong>in</strong>tern at HP Labs, India<br />

the exist<strong>in</strong>g nodes. The preferential attachment model has been<br />

able to expla<strong>in</strong> the existence of power-law degree distribution<br />

<strong>in</strong> many self-organiz<strong>in</strong>g networks like the WWW, citation<br />

networks, collaboration graph of movie actors etc. [2]. Clauset<br />

and Moore [14] proposed a network evolution model which<br />

tries to expla<strong>in</strong> how the World Wide Web develops a powerlaw<br />

degree distribution via rewir<strong>in</strong>g of edges.<br />

Many <strong>P2P</strong> content shar<strong>in</strong>g networks, <strong>in</strong> fact, explicitly<br />

<strong>in</strong>corporate the preferential attachment model of growth by<br />

design for e.g. <strong>in</strong> the GNUTELLA network every new node<br />

jo<strong>in</strong><strong>in</strong>g the system first connects to a handful of known servers.<br />

A GNUTELLA client wish<strong>in</strong>g to jo<strong>in</strong> the network must f<strong>in</strong>d<br />

the IP address of an <strong>in</strong>itial node to connect to by look<strong>in</strong>g up ad<br />

hoc lists of popular GNUTELLA servers. This ad hoc method<br />

of growth prompts new nodes to connect preferentially to<br />

nodes that are already well connected.<br />

RELATED WORK<br />

In this paper, we focus our attention on decentralized search<br />

<strong>in</strong> <strong>P2P</strong> networks which exhibit power-law degree distribution.<br />

In general, decentralized search <strong>in</strong> <strong>P2P</strong> networks is a difficult<br />

problem due to the absence of a centralized controller and due<br />

to the limited <strong>in</strong>formation available at each node. Thus, the<br />

central challenge is to enable a node to f<strong>in</strong>d another node by<br />

efficiently rout<strong>in</strong>g a message/query without global <strong>in</strong>formation<br />

about each node’s location be<strong>in</strong>g made available at a<br />

centralized location. In decentralized search, when a particular<br />

node ‘i’ wants to route a message to another node ‘j’, it can use<br />

traditional approaches like Breadth First <strong>Search</strong>, Depth First<br />

<strong>Search</strong> and other broadcast policies [11]. However, such<br />

approaches are very expensive <strong>in</strong> time and/or bandwidth<br />

consumed.<br />

As an alternate, search algorithms that exploit <strong>in</strong>formation<br />

about the overall structure of the network have been proposed<br />

<strong>in</strong> earlier work. Adamic et al [1] proposed a decentralized<br />

algorithm for search <strong>in</strong> <strong>P2P</strong> networks which takes advantage of<br />

the power-law degree distribution of these networks and<br />

performs better than the flood<strong>in</strong>g-based (e.g. breadth-first)<br />

search algorithm which is bandwidth <strong>in</strong>tensive and hence not<br />

scalable. Us<strong>in</strong>g the generat<strong>in</strong>g function approach <strong>in</strong>troduced by<br />

Newman [9], Adamic et al. showed that the distribution of the<br />

number of l<strong>in</strong>ks emanat<strong>in</strong>g from the highest degree neighbor of<br />

a node, among r neighbors is given by<br />

2<br />

r 1<br />

2)[1 ( x 1)<br />

]<br />

<br />

<br />

where n is the number of nodes and is the power-law<br />

exponent of the graph. Us<strong>in</strong>g this distribution they calculated<br />

1521-9097/10 $26.00 © 2010 IEEE<br />

DOI 10.1109/ICPADS.2010.73<br />

776


the expected degree of the highest degree neighbor and plot<br />

the ratio between the expected degree of the highest degree<br />

neighbor and the degree of a node. They showed that the<br />

probability of f<strong>in</strong>d<strong>in</strong>g an even higher degree node <strong>in</strong> the<br />

neighborhood of a high degree node depends strongly on the<br />

power-law exponent of the graph. For 2.0 < < 2.3 and small<br />

graphs, Adamic et al [1] showed that forward<strong>in</strong>g the message<br />

to the neighbor with the highest degree is a good approach for<br />

perform<strong>in</strong>g search. But for values<br />

this approach performs poorly. Another limitation of their<br />

approach is that it is not capable of handl<strong>in</strong>g networks with<br />

multiple components. In their simulations, Adamic et al [1]<br />

extracted the giant connected component from the network<br />

graph and analyzed the performance of their algorithm on this<br />

component only.<br />

Kle<strong>in</strong>berg [12] has proposed decentralized search<br />

algorithms <strong>in</strong> networks which exhibit the small world property.<br />

A graph G, is said to exhibit the small world property if its<br />

average path length is poly-logarithmic <strong>in</strong> n, the number of<br />

nodes <strong>in</strong> the network. This approach is basically a greedy<br />

algorithm which assumes that the network is embedded <strong>in</strong> a<br />

structured space (like a lattice). Given the lattice location of the<br />

target, each node forwards the message to the neighbor which<br />

is closest to the target with respect to the L 1 distance on the<br />

lattice. The author shows that this approach is efficient for<br />

certa<strong>in</strong> values of the dimension of the underly<strong>in</strong>g lattice.<br />

An alternate approach to the problem has been to set-up on<br />

an overlay network which is def<strong>in</strong>ed specifically for mak<strong>in</strong>g<br />

search easier. Distributed Hash Tables are an example of this<br />

approach - Lua et al. [10] present a review of related work.<br />

III. OUR APPROACH<br />

In this paper, we propose two new decentralized search<br />

algorithms that can be efficiently used to search through<br />

power-law networks with disconnected components and wide<br />

range of power-law exponents. The algorithms use local<br />

<strong>in</strong>formation such as the identities and connectedness of a<br />

node’s neighbors, and its neighbors’ neighbors, but not the<br />

target’s global position. We do not make any assumptions<br />

about the network be<strong>in</strong>g embedded <strong>in</strong> a structured space [12].<br />

We demonstrate that our search algorithms scale sub-l<strong>in</strong>early<br />

with the number of nodes <strong>in</strong> the network and guarantee<br />

completion of search query. Thus, our algorithms <strong>in</strong>crease<br />

reliability and availability of peer-to-peer networks. As our<br />

algorithms use rewir<strong>in</strong>g for search completion they guarantee<br />

completion of search query, even though the search time can be<br />

O(n) <strong>in</strong> the worst-case for a network with n nodes.<br />

In our approach, when a node ‘i’ wants to route a message<br />

to another node ‘j’, it forwards the message to the neighbor<br />

which has the highest degree. This is similar to the approach<br />

proposed by Adamic et al [1]. The key difference <strong>in</strong> our<br />

approach is how we handle the scenario when a search is<br />

unsuccessfully term<strong>in</strong>ated us<strong>in</strong>g the basic approach of ‘forward<br />

to highest degree neighbor’. This can happen for two reasons:<br />

(a) the message reaches a node which cannot forward the<br />

message s<strong>in</strong>ce all its neighbors have already been visited by<br />

the message or (b) the message is dest<strong>in</strong>ed for a node which<br />

lies <strong>in</strong> another component <strong>in</strong> the graph. Fig. 1 shows the<br />

failure rate graph for the ‘forward to highest degree neighbor’<br />

strategy of Adamic et al. We can see that search term<strong>in</strong>ates<br />

unsuccessfully quite frequently.<br />

yq q y<br />

Figure 1. Failure rate data for random power-law graphs of vary<strong>in</strong>g size and<br />

power-law exponent 3, for 1000 runs averaged 50 iterations. In each run we<br />

choose a random source and target for perform<strong>in</strong>g search and build a new<br />

random power-law graph <strong>in</strong> each iteration. We extract out the GCC before<br />

perform<strong>in</strong>g search.<br />

Earlier works have handled these scenarios by term<strong>in</strong>at<strong>in</strong>g<br />

the search unsuccessfully and work<strong>in</strong>g only on the giant<br />

connected component <strong>in</strong> the graph. Our approach, on the other<br />

hand, proposes a rewir<strong>in</strong>g algorithm to handle such scenarios.<br />

We show that our algorithm performs better for a wider range<br />

of power-law exponents and can handle networks which have<br />

multiple components. This is important s<strong>in</strong>ce real world <strong>P2P</strong><br />

networks may exhibit disconnected components due to a<br />

variety of reasons e.g. frequent entry and egress of nodes, node<br />

failure, existence of underly<strong>in</strong>g network limitations (firewalls<br />

etc.).<br />

As mentioned, our key contribution is handl<strong>in</strong>g scenarios<br />

where the basic approach of ‘forward to highest degree<br />

neighbor’ would have unsuccessfully term<strong>in</strong>ated. Our approach<br />

is that the node, at which the search has term<strong>in</strong>ated<br />

unsuccessfully (called the term<strong>in</strong>al node), breaks one of its<br />

outgo<strong>in</strong>g l<strong>in</strong>ks and forms a new outgo<strong>in</strong>g l<strong>in</strong>k. The l<strong>in</strong>k-tobreak<br />

is chosen randomly from one of the outgo<strong>in</strong>g l<strong>in</strong>ks of the<br />

term<strong>in</strong>al node and the l<strong>in</strong>k-to-make is chosen accord<strong>in</strong>g to the<br />

network evolution rule. We use the preferential attachment<br />

model <strong>in</strong> our algorithms to form l<strong>in</strong>ks dur<strong>in</strong>g re-wir<strong>in</strong>g.<br />

In our simulations, we start by generat<strong>in</strong>g a network with<br />

power-law degree distribution and we use this network to<br />

perform decentralized search simulations. We select a source<br />

& target node at random and then apply our search algorithm<br />

for rout<strong>in</strong>g the message. Note that the underly<strong>in</strong>g network<br />

does conta<strong>in</strong> disconnected components.<br />

Dur<strong>in</strong>g the simulations, each node forwards the message to<br />

the node with the highest degree. <strong>Search</strong> term<strong>in</strong>ates when the<br />

message reaches any one of the second neighbors of the target<br />

node. If and when the search reaches a node at which it can no<br />

longer proceed (because of earlier mentioned reasons), we<br />

allow this node to rewire one of its edges. The rewir<strong>in</strong>g<br />

strategy is described below. We note that the degree density of<br />

the network is never changed s<strong>in</strong>ce no new edges are added <strong>in</strong><br />

the graph and we rely strictly on rewir<strong>in</strong>g edges.<br />

777


The l<strong>in</strong>k-to-break is chosen randomly from one of the<br />

(outgo<strong>in</strong>g) l<strong>in</strong>ks of the term<strong>in</strong>al node; except that this cannot<br />

be the l<strong>in</strong>k through which the message was received. The l<strong>in</strong>kto-make<br />

is chosen accord<strong>in</strong>g to the network evolution rule, <strong>in</strong><br />

our case it follows the preferential attachment model i.e. the<br />

probability, p(k), that this new l<strong>in</strong>k would term<strong>in</strong>ate at a node<br />

with degree k is ~ k - . The creation of the new edge accord<strong>in</strong>g<br />

to preferential attachment rule ensures that the underly<strong>in</strong>g<br />

structure of the network is not violated. Empirical support <strong>in</strong><br />

favor of the preferential attachment rule as the reason for the<br />

emergence of power law degree distributions is also well<br />

established [2,8] and has guided our choice. We also ensure<br />

that the target node for rewir<strong>in</strong>g is chosen such that it has not<br />

already been visited by the message – <strong>in</strong> real implementations,<br />

this can be achieved if each message carries with it a list of all<br />

nodes it has visited <strong>in</strong> the network. We call this strategy as<br />

Term<strong>in</strong>al (T) node rewir<strong>in</strong>g strategy.<br />

We show that this dynamic model improves upon the<br />

orig<strong>in</strong>al algorithm <strong>in</strong> two important ways: (a) the rout<strong>in</strong>g times<br />

are lower as compared to the algorithm proposed by Adamic et<br />

al [1] for random power-law graphs of higher power-law<br />

exponent. The algorithm always f<strong>in</strong>ds the target if it exists,<br />

also search cost (number of steps until approximately the<br />

whole graph is revealed) scales scale sub-l<strong>in</strong>early with the size<br />

of the graph and (b) if the network has disconnected<br />

components, the search can still complete and the algorithm <strong>in</strong><br />

fact, connects the disconnected components to form a s<strong>in</strong>gle<br />

giant component.<br />

In a small extension of the Term<strong>in</strong>al (T) node rewir<strong>in</strong>g<br />

approach, we handle potentially unsuccessful term<strong>in</strong>ations by<br />

us<strong>in</strong>g two re-wir<strong>in</strong>gs <strong>in</strong> the graph <strong>in</strong>stead of one. As before,<br />

the term<strong>in</strong>al node rewires as described above. In addition, the<br />

orig<strong>in</strong>al source node of the message rewires to the term<strong>in</strong>al<br />

node. We do not expect this rewir<strong>in</strong>g to affect the results of<br />

search algorithm but this latter rewir<strong>in</strong>g may be seen as a<br />

reward (or an <strong>in</strong>centive) for the effort undertaken by term<strong>in</strong>al<br />

node to rewire one of their outgo<strong>in</strong>g edges to the next hop<br />

node. Thus, the term<strong>in</strong>al node’s degree rema<strong>in</strong>s constant. We<br />

call this strategy as Source(S)-T + T node rewir<strong>in</strong>g strategy.<br />

The empirical motivation for the source rewir<strong>in</strong>g to<br />

term<strong>in</strong>al node of this model is the notion of <strong>in</strong>centiviz<strong>in</strong>g<br />

nodes to rewire their edges to enable path completion for other<br />

nodes. This notion of <strong>in</strong>centivizes has been proposed <strong>in</strong> earlier<br />

work on query <strong>in</strong>centive networks [13] which notes that<br />

successful decentralized rout<strong>in</strong>g <strong>in</strong> social networks is critically<br />

dependent on <strong>in</strong>dividual’s decision to participate <strong>in</strong> the<br />

rout<strong>in</strong>g.<br />

IV. SIMULATION & ANALYSIS<br />

For carry<strong>in</strong>g out simulations we generate random powerlaw<br />

graphs with desired exponents and then simulate our<br />

algorithms on them and compare them with algorithms<br />

proposed <strong>in</strong> earlier work. Fig. 2 & 3 show the degree<br />

distribution before & after 1,000 iterations of our search<br />

algorithms on random power-law graphs with 10,000 nodes.<br />

We do not extract out the GCC but <strong>in</strong>stead work on the entire<br />

graph. We see that the rewir<strong>in</strong>g process does not significantly<br />

affect the power-law exponent of the graph.<br />

Figure 2. Degree distribution curves for random power-law graphs with<br />

10,000 nodes (a) before (b) <br />

algorithm.<br />

Figure 3. Degree distribution curves for random power-law graphs with<br />

10,000 nodes -T + T node rewir<strong>in</strong>g<br />

algorithm.<br />

The search cost for these two rewir<strong>in</strong>g models is almost the<br />

same and scales sub-l<strong>in</strong>early with the size of the graph: similar<br />

to the high degree node seek<strong>in</strong>g strategy of Adamic et al. [1].<br />

Fig. 4 shows this graph.<br />

Figure 4. This graph shows how search time scales sub-l<strong>in</strong>early with the<br />

graph size for our two new search algorithms. Simulations were carried on<br />

random power-law graphs of vary<strong>in</strong>g size and power-law exponent of 3. We<br />

are here not work<strong>in</strong>g on the GCC but the entire graph which has disconnected<br />

components. For calculat<strong>in</strong>g search cost we perform 100 iterations for each<br />

graph size.<br />

778


Fig. 5, 6 and 7 compare the cumulative number of nodes<br />

found by our proposed algorithms with the High Degree Node<br />

Seek<strong>in</strong>g Strategy (HDNSS) of Adamic et al [1]. We perform<br />

our simulations on random power-law graphs with exponents<br />

2, 2.5 and 3 respectively. We can see that both our rewir<strong>in</strong>g<br />

strategies perform at par with HDNSS for power-law graphs<br />

, a little better than HDNSS for power-law<br />

and vastly better than HDNSS for power-law<br />

<br />

Figure 7. Cumulative nodes found data for random power-law graph with<br />

100,000 nodes hav<strong>in</strong>g 21,841 connected components and power-law exponent<br />

of 3 and giant connected component with 47,085 nodes. Our algorithms<br />

perform much better as compared with HDNSS.<br />

Figure 5. Cumulative nodes found data for random power-law graph with<br />

10,000 nodes hav<strong>in</strong>g 164 connected components and power-law exponent of 2<br />

and giant connected component with 9398 nodes. Our algorithms perform at<br />

par with HDNSS algorithm.<br />

In order to test the performance of our algorithms <strong>in</strong> the<br />

presence of disconnected components, we perform another<br />

simulation on random power-law graphs. If the source and<br />

target are <strong>in</strong> separate components then the rout<strong>in</strong>g time will be<br />

<strong>in</strong>f<strong>in</strong>ite for HDNSS algorithm. Therefore we use a new metric<br />

(1/Rout<strong>in</strong>g Time) to compare the performance of our<br />

algorithms with HDNSS. A better rout<strong>in</strong>g algorithm will have<br />

lower Rout<strong>in</strong>g times and thus higher value of this metric<br />

(1/Rout<strong>in</strong>g Time). Fig. 8 shows that our algorithms perform<br />

better than HDNSS consistently for various graph sizes. The<br />

difference <strong>in</strong> the values of the metric will be higher as we<br />

<strong>in</strong>crease the power-law exponent of the graph or the graph size<br />

as this leads to an <strong>in</strong>crease <strong>in</strong> the number of connected<br />

components <strong>in</strong> the graph.<br />

Figure 6. Cumulative nodes found data for random power-law graph with<br />

10,000 nodes hav<strong>in</strong>g 852 connected components and power-law exponent of<br />

2.5 and giant connected component with 8185 nodes. We can see that HDNSS<br />

fails to discover new nodes after some steps. Our algorithms allow the<br />

message to propagate further by allow<strong>in</strong>g rewir<strong>in</strong>g. Eventually, our algorithms<br />

will traverse all nodes <strong>in</strong> the graph.<br />

Figure 8. Rout<strong>in</strong>g time graph for random power-law graphs of vary<strong>in</strong>g size<br />

and power-law exponent of 2. We calculate the average value of 1/Rout<strong>in</strong>g<br />

Time for 1,000 iterations of each algorithm us<strong>in</strong>g random source and target<br />

nodes. This process is repeated 10 times for each graph size to determ<strong>in</strong>e the<br />

f<strong>in</strong>al value of the metric for that graph size. Table I shows the average number<br />

of connected components for each graph size.<br />

779


TABLE I. THIS TABLE SHOWS THE NUMBER OF CONNECTED<br />

COMPONENTS FOR EACH GRAPH SIZE AVERAGED OVER 10 ITERATIONS<br />

TABLE II.<br />

THIS TABLE SHOWS HOW OUR SEARCH ALGORITHMS HELP IN<br />

RECOVERY OF THE NETWORK AFTER NODE FAILURE<br />

Size<br />

Average number of connected<br />

components<br />

500 11<br />

1000 20<br />

2000 43<br />

4000 75<br />

6000 104<br />

8000 131<br />

10000 158<br />

a. The average number of connected components <strong>in</strong>creases as the graph size <strong>in</strong>creases.<br />

Next, we simulate how our algorithms cope up with<br />

network failure. In Fig. 9 we plot a graph between the metric<br />

1/Rout<strong>in</strong>g Time and the network failure percentage. We see<br />

that as the network failure percentage <strong>in</strong>creases, the number of<br />

connected components <strong>in</strong> the graph also <strong>in</strong>creases and the<br />

performance of our algorithms improves as compared to<br />

HDNSS. Also <strong>in</strong> Table II, we show that our algorithms reduce<br />

the number of connected components of the graph because of<br />

their <strong>in</strong>herent rewir<strong>in</strong>g mechanisms. Thus, our algorithms<br />

<strong>in</strong>crease the overall reliability and availability of the network.<br />

Also, they help <strong>in</strong> recovery of the network after node failure.<br />

For this simulation we first plot a random power-law graph of<br />

<br />

connected component from this graph. Next, we simulate<br />

network failure by randomly remov<strong>in</strong>g nodes from the<br />

network. This leads to creation of disconnected components.<br />

Now we perform 1,000 search iterations of HDNSS, T node<br />

rewir<strong>in</strong>g and S-T + T node rewir<strong>in</strong>g algorithms us<strong>in</strong>g random<br />

source and target nodes.<br />

Figure 9. Rout<strong>in</strong>g time Vs Failure rate graph for random power-law network<br />

with 10,000 nodes and power-law exponent of 2. We can see that our rewir<strong>in</strong>g<br />

algorithms are more resilient to failure while the performance of HDNSS<br />

keeps deteriorat<strong>in</strong>g as the failure rate <strong>in</strong>creases. Also, our rewir<strong>in</strong>g algorithms<br />

reduce the number of disconnected components <strong>in</strong> the graph. See Table II.<br />

Failure Rate<br />

Number of<br />

connected<br />

components<br />

after network<br />

failure<br />

Number of<br />

connected<br />

components<br />

after T node<br />

rewir<strong>in</strong>g<br />

Number of<br />

connected<br />

components<br />

after S-T + T<br />

node rewir<strong>in</strong>g<br />

1% 48 43 37<br />

2.5% 115 105 106<br />

5% 187 161 161<br />

10% 392 328 320<br />

20% 804 660 666<br />

30% 1195 1003 986<br />

40% 1387 1141 1124<br />

50% 1646 1285 1298<br />

a. The number of connected components reduces after perform<strong>in</strong>g search us<strong>in</strong>g our algorithms.<br />

V. CONCLUSION<br />

In flood<strong>in</strong>g-based breadth-first algorithms there is a lot of<br />

wastage of network bandwidth as each nodes forwards the<br />

message to all its neighbors <strong>in</strong>stead of capitaliz<strong>in</strong>g on the<br />

power-law structure of the network. Adamic et al. [1]<br />

proposed a new decentralized forward<strong>in</strong>g which scales subl<strong>in</strong>early<br />

with the network size. Each node now passes the<br />

message to its neighbor of highest degree <strong>in</strong>stead of flood<strong>in</strong>g it<br />

to all its neighbors. But this algorithm performs poorly <strong>in</strong> case<br />

of graphs with disconnected components and graphs with<br />

power-law exponent higher than 2.3. We propose two new<br />

rewir<strong>in</strong>g algorithms that help alleviate these problems. Our<br />

algorithms also scale sub-l<strong>in</strong>early with the size of the graph<br />

and also work if the source and target are <strong>in</strong> different<br />

connected components. Our algorithms guarantee search<br />

completion if the target exists <strong>in</strong> the network. Also, if there is<br />

a partial failure <strong>in</strong> the network our algorithms cope up much<br />

better as compared to HDNSS and also connect the<br />

disconnected components along with search completion.<br />

Our rewir<strong>in</strong>g strategy has applications <strong>in</strong> a number of<br />

power-law networks. Of particular <strong>in</strong>terest are the peer-to-peer<br />

content shar<strong>in</strong>g networks s<strong>in</strong>ce rewir<strong>in</strong>g <strong>in</strong> such networks has<br />

very low cost and the primary <strong>in</strong>terest of participat<strong>in</strong>g nodes is<br />

query completion. There is also the concept of <strong>in</strong>herent<br />

cooperation <strong>in</strong> these networks and the network evolution<br />

model is Preferential Attachment [1] i.e. nodes will<br />

preferentially attach to nodes that are hubs of the network.<br />

These networks are dynamic <strong>in</strong> nature i.e. nodes jo<strong>in</strong> and leave<br />

the network frequently which can lead to creation of<br />

disconnected components <strong>in</strong> the network. The S-T + T node<br />

rewir<strong>in</strong>g which has <strong>in</strong>herent feature of a query <strong>in</strong>centives and<br />

cooperation model is very aptly applicable to the peer-to-peer<br />

overlay networks.<br />

The implementation of our algorithm requires the<br />

participat<strong>in</strong>g nodes of the GNUTELLA network to keep the<br />

lists of the files stored by their first and second neighbors. This<br />

<strong>in</strong>formation must be updated periodically depend<strong>in</strong>g on the<br />

typical lifetime of nodes <strong>in</strong> the network. Also, s<strong>in</strong>ce we avoid<br />

revisit<strong>in</strong>g a node while perform<strong>in</strong>g the search algorithm, we<br />

need to append the IP addresses of the nodes already queried to<br />

the search message.<br />

780


REFERENCES<br />

[1] Adamic, L. A., Lukose, R. M., Puniyani, A. R., Huberman, B. A.,<br />

“<strong>Search</strong> <strong>in</strong> Power-Law <strong>Networks</strong>”, Phys. Rev. E 64, 2001, 46135–<br />

46143.<br />

[2] Barabási, A.-L. and Albert, R., “Emergence of scal<strong>in</strong>g <strong>in</strong> random<br />

networks”, Science 286, 1999, 509512.<br />

[3] Albert R., Jeong H., and Barabási A.-L., “Error and Attack Tolerance of<br />

Complex <strong>Networks</strong>”, Nature, 2000, 406:378–381.<br />

[4] Samant K. and Bhattacharyya S., “Topology, <strong>Search</strong>, and Fault<br />

Tolerance <strong>in</strong> Unstructured <strong>P2P</strong> <strong>Networks</strong>”, In Proceed<strong>in</strong>gs of the 37th<br />

Hawaii International Conference on System Sience (HICCS’04), IEEE<br />

Computer Society, 2004.<br />

[5] “The Gnutella Protocol Specification v.0.4.”<br />

http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf, 2001.<br />

[6] Clarke I., Sandberg O., Wiley B., and Hong T., “<strong>Free</strong>net: A Distributed<br />

Anonymous Information Storage and Retrieval System”, Lecture Notes<br />

<strong>in</strong> Computer Science, 2001, 2009:46-66.<br />

[7] Ripeanu, M., Foster, I., and Iamnitchi, A., “Mapp<strong>in</strong>g the Gnutella<br />

network: Properties of large-scale peer-to-peer systems and implications<br />

for system design”, IEEE Internet Comput<strong>in</strong>g 6, 2002, 5057.<br />

[8] Newman, M. E. J., “The structure and function of complex networks”,<br />

SIAM Review 45, 2003, 167–256.<br />

[9] Newman, M.E.J., Strogatz, S. H., and Watts, D. J., “Random Graphs<br />

with arbitrary degree distributions and their applications”, Phys. Rev. E<br />

64, 2001, 026118.<br />

[10] Lua, E.-K., Crowcroft, J., Pias, M., Sharma, R., and Lim, S., “A Survey<br />

and Comparison of Peer-to-Peer Overlay Network Schemes”, IEEE<br />

Communications Surveys and Tutorials 7, 2005, 72–93.<br />

[11] Yang, B. And Garcia-Mol<strong>in</strong>a, H., “Improv<strong>in</strong>g search <strong>in</strong> peer-to-peer<br />

networks”, In Proceed<strong>in</strong>gs of the 22 nd International Conference on<br />

Distributed Comput<strong>in</strong>g Systems (ICDCS’ 02), Vienna, Autria.<br />

[12] Kle<strong>in</strong>berg J., “The Small-World phenomenon and decentralized search”,<br />

SIAM News 37(3), 2004.<br />

[13] Kle<strong>in</strong>berg J. and Raghavan P., “Query Incentive <strong>Networks</strong>”, In<br />

Proceed<strong>in</strong>gs of the 46th IEEE Symposium Foundations of Computer<br />

Science, IEEE Comput. Soc. Press, LosAlamitos.<br />

[14] Clauset A. And Moore C., “How do networks become navigable”,<br />

Prepr<strong>in</strong>t, 2003;arxiv.org,cond-mat/0309415.<br />

781

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!