Decentralized Search in Scale-Free P2P Networks
Decentralized Search in Scale-Free P2P Networks
Decentralized Search in Scale-Free P2P Networks
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2010 16th International Conference on Parallel and Distributed Systems<br />
<strong>Decentralized</strong> <strong>Search</strong> <strong>in</strong> <strong>Scale</strong>-<strong>Free</strong> <strong>P2P</strong> <strong>Networks</strong><br />
Praphul Chandra 1 and Dushyant Arora 2<br />
HP Labs, Bangalore, India<br />
praphul.chandra@hp.com, dushyantarora13@gmail.com<br />
Abstract—<strong>Search</strong> <strong>in</strong> peer-to-peer networks is a challeng<strong>in</strong>g<br />
problem due to the absence of any centralized control & the<br />
limited <strong>in</strong>formation available at each node. When <strong>in</strong>formation is<br />
available about the overall structure of the network, use of this<br />
<strong>in</strong>formation can significantly improve the efficiency of<br />
decentralized search algorithms. Many peer-to-peer networks<br />
have been shown to exhibit power-law degree distribution. We<br />
propose two new decentralized search algorithms that can be<br />
used for efficient search <strong>in</strong> networks exhibit<strong>in</strong>g scale-free design.<br />
Unlike previous work, our algorithms perform efficient search<br />
for a large range of power-law coefficients. Our algorithms are<br />
also unique <strong>in</strong> that they complete decentralized searches<br />
efficiently even when the network has disconnected components.<br />
As a corollary of this, our algorithms are also more resilient to<br />
network failure.<br />
Keywords-Peer-to-peer networks; decentralized search; powerlaw<br />
distribution; scale-free networks<br />
II.<br />
I. INTRODUCTION<br />
Many peer-to-peer (<strong>P2P</strong>) networks have been shown to<br />
exhibit a power-law degree distribution. This is often true for<br />
large scale peer-to-peer networks which emerge for content<br />
shar<strong>in</strong>g e.g. Gnutella & <strong>Free</strong>net [5,6]. A graph G, is said to<br />
exhibit a power-law degree distribution if the number of nodes<br />
with k l<strong>in</strong>ks is proportional to k - law<br />
exponent of the graph and has a value greater than 1.<br />
When the value of the power-law exponent lies between 2 & 3,<br />
the second moment i.e. the variance of k, tends to <strong>in</strong>f<strong>in</strong>ity<br />
and such networks are referred to as scale-free networks. In a<br />
network exhibit<strong>in</strong>g power-law degree distribution, most nodes<br />
possess a small degree and a few handful of nodes act as hubs<br />
with very high degree [1].<br />
In large scale <strong>P2P</strong> networks, the power-law degree<br />
distribution is an emergent feature which offers strong network<br />
resilience [3]. S<strong>in</strong>ce the degree distribution is heterogeneous, a<br />
random failure will very likely strike one of the nodes with low<br />
degree. The majority of the nodes <strong>in</strong> the network have a small<br />
degree - these nodes are most often not crucial for the<br />
connectivity of the network. Hence, <strong>in</strong> such networks, the giant<br />
connected component (GCC), which is likely to exist, persists<br />
for high values of node failure rates <strong>in</strong> the network. This<br />
resilience to random failure is what makes the power-law<br />
degree distribution attractive to <strong>P2P</strong> networks which are often<br />
characterized by frequent churn of nodes jo<strong>in</strong><strong>in</strong>g & leav<strong>in</strong>g the<br />
system [4].<br />
1<br />
p(<br />
x,<br />
r)<br />
r(1<br />
x)<br />
( <br />
Various models have been proposed to expla<strong>in</strong> the <br />
emergence of power-law degree distribution <strong>in</strong> networks<br />
2 / 1<br />
r<br />
without a central controller/coord<strong>in</strong>ator. Among the most (1 n )<br />
prom<strong>in</strong>ent is the preferential attachment model of Barabási and<br />
Albert [2] which builds on the notion that new nodes tend to<br />
form l<strong>in</strong>ks with exist<strong>in</strong>g nodes <strong>in</strong> proportion to the degree of<br />
1 Contact Author<br />
2 While <strong>in</strong>tern at HP Labs, India<br />
the exist<strong>in</strong>g nodes. The preferential attachment model has been<br />
able to expla<strong>in</strong> the existence of power-law degree distribution<br />
<strong>in</strong> many self-organiz<strong>in</strong>g networks like the WWW, citation<br />
networks, collaboration graph of movie actors etc. [2]. Clauset<br />
and Moore [14] proposed a network evolution model which<br />
tries to expla<strong>in</strong> how the World Wide Web develops a powerlaw<br />
degree distribution via rewir<strong>in</strong>g of edges.<br />
Many <strong>P2P</strong> content shar<strong>in</strong>g networks, <strong>in</strong> fact, explicitly<br />
<strong>in</strong>corporate the preferential attachment model of growth by<br />
design for e.g. <strong>in</strong> the GNUTELLA network every new node<br />
jo<strong>in</strong><strong>in</strong>g the system first connects to a handful of known servers.<br />
A GNUTELLA client wish<strong>in</strong>g to jo<strong>in</strong> the network must f<strong>in</strong>d<br />
the IP address of an <strong>in</strong>itial node to connect to by look<strong>in</strong>g up ad<br />
hoc lists of popular GNUTELLA servers. This ad hoc method<br />
of growth prompts new nodes to connect preferentially to<br />
nodes that are already well connected.<br />
RELATED WORK<br />
In this paper, we focus our attention on decentralized search<br />
<strong>in</strong> <strong>P2P</strong> networks which exhibit power-law degree distribution.<br />
In general, decentralized search <strong>in</strong> <strong>P2P</strong> networks is a difficult<br />
problem due to the absence of a centralized controller and due<br />
to the limited <strong>in</strong>formation available at each node. Thus, the<br />
central challenge is to enable a node to f<strong>in</strong>d another node by<br />
efficiently rout<strong>in</strong>g a message/query without global <strong>in</strong>formation<br />
about each node’s location be<strong>in</strong>g made available at a<br />
centralized location. In decentralized search, when a particular<br />
node ‘i’ wants to route a message to another node ‘j’, it can use<br />
traditional approaches like Breadth First <strong>Search</strong>, Depth First<br />
<strong>Search</strong> and other broadcast policies [11]. However, such<br />
approaches are very expensive <strong>in</strong> time and/or bandwidth<br />
consumed.<br />
As an alternate, search algorithms that exploit <strong>in</strong>formation<br />
about the overall structure of the network have been proposed<br />
<strong>in</strong> earlier work. Adamic et al [1] proposed a decentralized<br />
algorithm for search <strong>in</strong> <strong>P2P</strong> networks which takes advantage of<br />
the power-law degree distribution of these networks and<br />
performs better than the flood<strong>in</strong>g-based (e.g. breadth-first)<br />
search algorithm which is bandwidth <strong>in</strong>tensive and hence not<br />
scalable. Us<strong>in</strong>g the generat<strong>in</strong>g function approach <strong>in</strong>troduced by<br />
Newman [9], Adamic et al. showed that the distribution of the<br />
number of l<strong>in</strong>ks emanat<strong>in</strong>g from the highest degree neighbor of<br />
a node, among r neighbors is given by<br />
2<br />
r 1<br />
2)[1 ( x 1)<br />
]<br />
<br />
<br />
where n is the number of nodes and is the power-law<br />
exponent of the graph. Us<strong>in</strong>g this distribution they calculated<br />
1521-9097/10 $26.00 © 2010 IEEE<br />
DOI 10.1109/ICPADS.2010.73<br />
776
the expected degree of the highest degree neighbor and plot<br />
the ratio between the expected degree of the highest degree<br />
neighbor and the degree of a node. They showed that the<br />
probability of f<strong>in</strong>d<strong>in</strong>g an even higher degree node <strong>in</strong> the<br />
neighborhood of a high degree node depends strongly on the<br />
power-law exponent of the graph. For 2.0 < < 2.3 and small<br />
graphs, Adamic et al [1] showed that forward<strong>in</strong>g the message<br />
to the neighbor with the highest degree is a good approach for<br />
perform<strong>in</strong>g search. But for values<br />
this approach performs poorly. Another limitation of their<br />
approach is that it is not capable of handl<strong>in</strong>g networks with<br />
multiple components. In their simulations, Adamic et al [1]<br />
extracted the giant connected component from the network<br />
graph and analyzed the performance of their algorithm on this<br />
component only.<br />
Kle<strong>in</strong>berg [12] has proposed decentralized search<br />
algorithms <strong>in</strong> networks which exhibit the small world property.<br />
A graph G, is said to exhibit the small world property if its<br />
average path length is poly-logarithmic <strong>in</strong> n, the number of<br />
nodes <strong>in</strong> the network. This approach is basically a greedy<br />
algorithm which assumes that the network is embedded <strong>in</strong> a<br />
structured space (like a lattice). Given the lattice location of the<br />
target, each node forwards the message to the neighbor which<br />
is closest to the target with respect to the L 1 distance on the<br />
lattice. The author shows that this approach is efficient for<br />
certa<strong>in</strong> values of the dimension of the underly<strong>in</strong>g lattice.<br />
An alternate approach to the problem has been to set-up on<br />
an overlay network which is def<strong>in</strong>ed specifically for mak<strong>in</strong>g<br />
search easier. Distributed Hash Tables are an example of this<br />
approach - Lua et al. [10] present a review of related work.<br />
III. OUR APPROACH<br />
In this paper, we propose two new decentralized search<br />
algorithms that can be efficiently used to search through<br />
power-law networks with disconnected components and wide<br />
range of power-law exponents. The algorithms use local<br />
<strong>in</strong>formation such as the identities and connectedness of a<br />
node’s neighbors, and its neighbors’ neighbors, but not the<br />
target’s global position. We do not make any assumptions<br />
about the network be<strong>in</strong>g embedded <strong>in</strong> a structured space [12].<br />
We demonstrate that our search algorithms scale sub-l<strong>in</strong>early<br />
with the number of nodes <strong>in</strong> the network and guarantee<br />
completion of search query. Thus, our algorithms <strong>in</strong>crease<br />
reliability and availability of peer-to-peer networks. As our<br />
algorithms use rewir<strong>in</strong>g for search completion they guarantee<br />
completion of search query, even though the search time can be<br />
O(n) <strong>in</strong> the worst-case for a network with n nodes.<br />
In our approach, when a node ‘i’ wants to route a message<br />
to another node ‘j’, it forwards the message to the neighbor<br />
which has the highest degree. This is similar to the approach<br />
proposed by Adamic et al [1]. The key difference <strong>in</strong> our<br />
approach is how we handle the scenario when a search is<br />
unsuccessfully term<strong>in</strong>ated us<strong>in</strong>g the basic approach of ‘forward<br />
to highest degree neighbor’. This can happen for two reasons:<br />
(a) the message reaches a node which cannot forward the<br />
message s<strong>in</strong>ce all its neighbors have already been visited by<br />
the message or (b) the message is dest<strong>in</strong>ed for a node which<br />
lies <strong>in</strong> another component <strong>in</strong> the graph. Fig. 1 shows the<br />
failure rate graph for the ‘forward to highest degree neighbor’<br />
strategy of Adamic et al. We can see that search term<strong>in</strong>ates<br />
unsuccessfully quite frequently.<br />
yq q y<br />
Figure 1. Failure rate data for random power-law graphs of vary<strong>in</strong>g size and<br />
power-law exponent 3, for 1000 runs averaged 50 iterations. In each run we<br />
choose a random source and target for perform<strong>in</strong>g search and build a new<br />
random power-law graph <strong>in</strong> each iteration. We extract out the GCC before<br />
perform<strong>in</strong>g search.<br />
Earlier works have handled these scenarios by term<strong>in</strong>at<strong>in</strong>g<br />
the search unsuccessfully and work<strong>in</strong>g only on the giant<br />
connected component <strong>in</strong> the graph. Our approach, on the other<br />
hand, proposes a rewir<strong>in</strong>g algorithm to handle such scenarios.<br />
We show that our algorithm performs better for a wider range<br />
of power-law exponents and can handle networks which have<br />
multiple components. This is important s<strong>in</strong>ce real world <strong>P2P</strong><br />
networks may exhibit disconnected components due to a<br />
variety of reasons e.g. frequent entry and egress of nodes, node<br />
failure, existence of underly<strong>in</strong>g network limitations (firewalls<br />
etc.).<br />
As mentioned, our key contribution is handl<strong>in</strong>g scenarios<br />
where the basic approach of ‘forward to highest degree<br />
neighbor’ would have unsuccessfully term<strong>in</strong>ated. Our approach<br />
is that the node, at which the search has term<strong>in</strong>ated<br />
unsuccessfully (called the term<strong>in</strong>al node), breaks one of its<br />
outgo<strong>in</strong>g l<strong>in</strong>ks and forms a new outgo<strong>in</strong>g l<strong>in</strong>k. The l<strong>in</strong>k-tobreak<br />
is chosen randomly from one of the outgo<strong>in</strong>g l<strong>in</strong>ks of the<br />
term<strong>in</strong>al node and the l<strong>in</strong>k-to-make is chosen accord<strong>in</strong>g to the<br />
network evolution rule. We use the preferential attachment<br />
model <strong>in</strong> our algorithms to form l<strong>in</strong>ks dur<strong>in</strong>g re-wir<strong>in</strong>g.<br />
In our simulations, we start by generat<strong>in</strong>g a network with<br />
power-law degree distribution and we use this network to<br />
perform decentralized search simulations. We select a source<br />
& target node at random and then apply our search algorithm<br />
for rout<strong>in</strong>g the message. Note that the underly<strong>in</strong>g network<br />
does conta<strong>in</strong> disconnected components.<br />
Dur<strong>in</strong>g the simulations, each node forwards the message to<br />
the node with the highest degree. <strong>Search</strong> term<strong>in</strong>ates when the<br />
message reaches any one of the second neighbors of the target<br />
node. If and when the search reaches a node at which it can no<br />
longer proceed (because of earlier mentioned reasons), we<br />
allow this node to rewire one of its edges. The rewir<strong>in</strong>g<br />
strategy is described below. We note that the degree density of<br />
the network is never changed s<strong>in</strong>ce no new edges are added <strong>in</strong><br />
the graph and we rely strictly on rewir<strong>in</strong>g edges.<br />
777
The l<strong>in</strong>k-to-break is chosen randomly from one of the<br />
(outgo<strong>in</strong>g) l<strong>in</strong>ks of the term<strong>in</strong>al node; except that this cannot<br />
be the l<strong>in</strong>k through which the message was received. The l<strong>in</strong>kto-make<br />
is chosen accord<strong>in</strong>g to the network evolution rule, <strong>in</strong><br />
our case it follows the preferential attachment model i.e. the<br />
probability, p(k), that this new l<strong>in</strong>k would term<strong>in</strong>ate at a node<br />
with degree k is ~ k - . The creation of the new edge accord<strong>in</strong>g<br />
to preferential attachment rule ensures that the underly<strong>in</strong>g<br />
structure of the network is not violated. Empirical support <strong>in</strong><br />
favor of the preferential attachment rule as the reason for the<br />
emergence of power law degree distributions is also well<br />
established [2,8] and has guided our choice. We also ensure<br />
that the target node for rewir<strong>in</strong>g is chosen such that it has not<br />
already been visited by the message – <strong>in</strong> real implementations,<br />
this can be achieved if each message carries with it a list of all<br />
nodes it has visited <strong>in</strong> the network. We call this strategy as<br />
Term<strong>in</strong>al (T) node rewir<strong>in</strong>g strategy.<br />
We show that this dynamic model improves upon the<br />
orig<strong>in</strong>al algorithm <strong>in</strong> two important ways: (a) the rout<strong>in</strong>g times<br />
are lower as compared to the algorithm proposed by Adamic et<br />
al [1] for random power-law graphs of higher power-law<br />
exponent. The algorithm always f<strong>in</strong>ds the target if it exists,<br />
also search cost (number of steps until approximately the<br />
whole graph is revealed) scales scale sub-l<strong>in</strong>early with the size<br />
of the graph and (b) if the network has disconnected<br />
components, the search can still complete and the algorithm <strong>in</strong><br />
fact, connects the disconnected components to form a s<strong>in</strong>gle<br />
giant component.<br />
In a small extension of the Term<strong>in</strong>al (T) node rewir<strong>in</strong>g<br />
approach, we handle potentially unsuccessful term<strong>in</strong>ations by<br />
us<strong>in</strong>g two re-wir<strong>in</strong>gs <strong>in</strong> the graph <strong>in</strong>stead of one. As before,<br />
the term<strong>in</strong>al node rewires as described above. In addition, the<br />
orig<strong>in</strong>al source node of the message rewires to the term<strong>in</strong>al<br />
node. We do not expect this rewir<strong>in</strong>g to affect the results of<br />
search algorithm but this latter rewir<strong>in</strong>g may be seen as a<br />
reward (or an <strong>in</strong>centive) for the effort undertaken by term<strong>in</strong>al<br />
node to rewire one of their outgo<strong>in</strong>g edges to the next hop<br />
node. Thus, the term<strong>in</strong>al node’s degree rema<strong>in</strong>s constant. We<br />
call this strategy as Source(S)-T + T node rewir<strong>in</strong>g strategy.<br />
The empirical motivation for the source rewir<strong>in</strong>g to<br />
term<strong>in</strong>al node of this model is the notion of <strong>in</strong>centiviz<strong>in</strong>g<br />
nodes to rewire their edges to enable path completion for other<br />
nodes. This notion of <strong>in</strong>centivizes has been proposed <strong>in</strong> earlier<br />
work on query <strong>in</strong>centive networks [13] which notes that<br />
successful decentralized rout<strong>in</strong>g <strong>in</strong> social networks is critically<br />
dependent on <strong>in</strong>dividual’s decision to participate <strong>in</strong> the<br />
rout<strong>in</strong>g.<br />
IV. SIMULATION & ANALYSIS<br />
For carry<strong>in</strong>g out simulations we generate random powerlaw<br />
graphs with desired exponents and then simulate our<br />
algorithms on them and compare them with algorithms<br />
proposed <strong>in</strong> earlier work. Fig. 2 & 3 show the degree<br />
distribution before & after 1,000 iterations of our search<br />
algorithms on random power-law graphs with 10,000 nodes.<br />
We do not extract out the GCC but <strong>in</strong>stead work on the entire<br />
graph. We see that the rewir<strong>in</strong>g process does not significantly<br />
affect the power-law exponent of the graph.<br />
Figure 2. Degree distribution curves for random power-law graphs with<br />
10,000 nodes (a) before (b) <br />
algorithm.<br />
Figure 3. Degree distribution curves for random power-law graphs with<br />
10,000 nodes -T + T node rewir<strong>in</strong>g<br />
algorithm.<br />
The search cost for these two rewir<strong>in</strong>g models is almost the<br />
same and scales sub-l<strong>in</strong>early with the size of the graph: similar<br />
to the high degree node seek<strong>in</strong>g strategy of Adamic et al. [1].<br />
Fig. 4 shows this graph.<br />
Figure 4. This graph shows how search time scales sub-l<strong>in</strong>early with the<br />
graph size for our two new search algorithms. Simulations were carried on<br />
random power-law graphs of vary<strong>in</strong>g size and power-law exponent of 3. We<br />
are here not work<strong>in</strong>g on the GCC but the entire graph which has disconnected<br />
components. For calculat<strong>in</strong>g search cost we perform 100 iterations for each<br />
graph size.<br />
778
Fig. 5, 6 and 7 compare the cumulative number of nodes<br />
found by our proposed algorithms with the High Degree Node<br />
Seek<strong>in</strong>g Strategy (HDNSS) of Adamic et al [1]. We perform<br />
our simulations on random power-law graphs with exponents<br />
2, 2.5 and 3 respectively. We can see that both our rewir<strong>in</strong>g<br />
strategies perform at par with HDNSS for power-law graphs<br />
, a little better than HDNSS for power-law<br />
and vastly better than HDNSS for power-law<br />
<br />
Figure 7. Cumulative nodes found data for random power-law graph with<br />
100,000 nodes hav<strong>in</strong>g 21,841 connected components and power-law exponent<br />
of 3 and giant connected component with 47,085 nodes. Our algorithms<br />
perform much better as compared with HDNSS.<br />
Figure 5. Cumulative nodes found data for random power-law graph with<br />
10,000 nodes hav<strong>in</strong>g 164 connected components and power-law exponent of 2<br />
and giant connected component with 9398 nodes. Our algorithms perform at<br />
par with HDNSS algorithm.<br />
In order to test the performance of our algorithms <strong>in</strong> the<br />
presence of disconnected components, we perform another<br />
simulation on random power-law graphs. If the source and<br />
target are <strong>in</strong> separate components then the rout<strong>in</strong>g time will be<br />
<strong>in</strong>f<strong>in</strong>ite for HDNSS algorithm. Therefore we use a new metric<br />
(1/Rout<strong>in</strong>g Time) to compare the performance of our<br />
algorithms with HDNSS. A better rout<strong>in</strong>g algorithm will have<br />
lower Rout<strong>in</strong>g times and thus higher value of this metric<br />
(1/Rout<strong>in</strong>g Time). Fig. 8 shows that our algorithms perform<br />
better than HDNSS consistently for various graph sizes. The<br />
difference <strong>in</strong> the values of the metric will be higher as we<br />
<strong>in</strong>crease the power-law exponent of the graph or the graph size<br />
as this leads to an <strong>in</strong>crease <strong>in</strong> the number of connected<br />
components <strong>in</strong> the graph.<br />
Figure 6. Cumulative nodes found data for random power-law graph with<br />
10,000 nodes hav<strong>in</strong>g 852 connected components and power-law exponent of<br />
2.5 and giant connected component with 8185 nodes. We can see that HDNSS<br />
fails to discover new nodes after some steps. Our algorithms allow the<br />
message to propagate further by allow<strong>in</strong>g rewir<strong>in</strong>g. Eventually, our algorithms<br />
will traverse all nodes <strong>in</strong> the graph.<br />
Figure 8. Rout<strong>in</strong>g time graph for random power-law graphs of vary<strong>in</strong>g size<br />
and power-law exponent of 2. We calculate the average value of 1/Rout<strong>in</strong>g<br />
Time for 1,000 iterations of each algorithm us<strong>in</strong>g random source and target<br />
nodes. This process is repeated 10 times for each graph size to determ<strong>in</strong>e the<br />
f<strong>in</strong>al value of the metric for that graph size. Table I shows the average number<br />
of connected components for each graph size.<br />
779
TABLE I. THIS TABLE SHOWS THE NUMBER OF CONNECTED<br />
COMPONENTS FOR EACH GRAPH SIZE AVERAGED OVER 10 ITERATIONS<br />
TABLE II.<br />
THIS TABLE SHOWS HOW OUR SEARCH ALGORITHMS HELP IN<br />
RECOVERY OF THE NETWORK AFTER NODE FAILURE<br />
Size<br />
Average number of connected<br />
components<br />
500 11<br />
1000 20<br />
2000 43<br />
4000 75<br />
6000 104<br />
8000 131<br />
10000 158<br />
a. The average number of connected components <strong>in</strong>creases as the graph size <strong>in</strong>creases.<br />
Next, we simulate how our algorithms cope up with<br />
network failure. In Fig. 9 we plot a graph between the metric<br />
1/Rout<strong>in</strong>g Time and the network failure percentage. We see<br />
that as the network failure percentage <strong>in</strong>creases, the number of<br />
connected components <strong>in</strong> the graph also <strong>in</strong>creases and the<br />
performance of our algorithms improves as compared to<br />
HDNSS. Also <strong>in</strong> Table II, we show that our algorithms reduce<br />
the number of connected components of the graph because of<br />
their <strong>in</strong>herent rewir<strong>in</strong>g mechanisms. Thus, our algorithms<br />
<strong>in</strong>crease the overall reliability and availability of the network.<br />
Also, they help <strong>in</strong> recovery of the network after node failure.<br />
For this simulation we first plot a random power-law graph of<br />
<br />
connected component from this graph. Next, we simulate<br />
network failure by randomly remov<strong>in</strong>g nodes from the<br />
network. This leads to creation of disconnected components.<br />
Now we perform 1,000 search iterations of HDNSS, T node<br />
rewir<strong>in</strong>g and S-T + T node rewir<strong>in</strong>g algorithms us<strong>in</strong>g random<br />
source and target nodes.<br />
Figure 9. Rout<strong>in</strong>g time Vs Failure rate graph for random power-law network<br />
with 10,000 nodes and power-law exponent of 2. We can see that our rewir<strong>in</strong>g<br />
algorithms are more resilient to failure while the performance of HDNSS<br />
keeps deteriorat<strong>in</strong>g as the failure rate <strong>in</strong>creases. Also, our rewir<strong>in</strong>g algorithms<br />
reduce the number of disconnected components <strong>in</strong> the graph. See Table II.<br />
Failure Rate<br />
Number of<br />
connected<br />
components<br />
after network<br />
failure<br />
Number of<br />
connected<br />
components<br />
after T node<br />
rewir<strong>in</strong>g<br />
Number of<br />
connected<br />
components<br />
after S-T + T<br />
node rewir<strong>in</strong>g<br />
1% 48 43 37<br />
2.5% 115 105 106<br />
5% 187 161 161<br />
10% 392 328 320<br />
20% 804 660 666<br />
30% 1195 1003 986<br />
40% 1387 1141 1124<br />
50% 1646 1285 1298<br />
a. The number of connected components reduces after perform<strong>in</strong>g search us<strong>in</strong>g our algorithms.<br />
V. CONCLUSION<br />
In flood<strong>in</strong>g-based breadth-first algorithms there is a lot of<br />
wastage of network bandwidth as each nodes forwards the<br />
message to all its neighbors <strong>in</strong>stead of capitaliz<strong>in</strong>g on the<br />
power-law structure of the network. Adamic et al. [1]<br />
proposed a new decentralized forward<strong>in</strong>g which scales subl<strong>in</strong>early<br />
with the network size. Each node now passes the<br />
message to its neighbor of highest degree <strong>in</strong>stead of flood<strong>in</strong>g it<br />
to all its neighbors. But this algorithm performs poorly <strong>in</strong> case<br />
of graphs with disconnected components and graphs with<br />
power-law exponent higher than 2.3. We propose two new<br />
rewir<strong>in</strong>g algorithms that help alleviate these problems. Our<br />
algorithms also scale sub-l<strong>in</strong>early with the size of the graph<br />
and also work if the source and target are <strong>in</strong> different<br />
connected components. Our algorithms guarantee search<br />
completion if the target exists <strong>in</strong> the network. Also, if there is<br />
a partial failure <strong>in</strong> the network our algorithms cope up much<br />
better as compared to HDNSS and also connect the<br />
disconnected components along with search completion.<br />
Our rewir<strong>in</strong>g strategy has applications <strong>in</strong> a number of<br />
power-law networks. Of particular <strong>in</strong>terest are the peer-to-peer<br />
content shar<strong>in</strong>g networks s<strong>in</strong>ce rewir<strong>in</strong>g <strong>in</strong> such networks has<br />
very low cost and the primary <strong>in</strong>terest of participat<strong>in</strong>g nodes is<br />
query completion. There is also the concept of <strong>in</strong>herent<br />
cooperation <strong>in</strong> these networks and the network evolution<br />
model is Preferential Attachment [1] i.e. nodes will<br />
preferentially attach to nodes that are hubs of the network.<br />
These networks are dynamic <strong>in</strong> nature i.e. nodes jo<strong>in</strong> and leave<br />
the network frequently which can lead to creation of<br />
disconnected components <strong>in</strong> the network. The S-T + T node<br />
rewir<strong>in</strong>g which has <strong>in</strong>herent feature of a query <strong>in</strong>centives and<br />
cooperation model is very aptly applicable to the peer-to-peer<br />
overlay networks.<br />
The implementation of our algorithm requires the<br />
participat<strong>in</strong>g nodes of the GNUTELLA network to keep the<br />
lists of the files stored by their first and second neighbors. This<br />
<strong>in</strong>formation must be updated periodically depend<strong>in</strong>g on the<br />
typical lifetime of nodes <strong>in</strong> the network. Also, s<strong>in</strong>ce we avoid<br />
revisit<strong>in</strong>g a node while perform<strong>in</strong>g the search algorithm, we<br />
need to append the IP addresses of the nodes already queried to<br />
the search message.<br />
780
REFERENCES<br />
[1] Adamic, L. A., Lukose, R. M., Puniyani, A. R., Huberman, B. A.,<br />
“<strong>Search</strong> <strong>in</strong> Power-Law <strong>Networks</strong>”, Phys. Rev. E 64, 2001, 46135–<br />
46143.<br />
[2] Barabási, A.-L. and Albert, R., “Emergence of scal<strong>in</strong>g <strong>in</strong> random<br />
networks”, Science 286, 1999, 509512.<br />
[3] Albert R., Jeong H., and Barabási A.-L., “Error and Attack Tolerance of<br />
Complex <strong>Networks</strong>”, Nature, 2000, 406:378–381.<br />
[4] Samant K. and Bhattacharyya S., “Topology, <strong>Search</strong>, and Fault<br />
Tolerance <strong>in</strong> Unstructured <strong>P2P</strong> <strong>Networks</strong>”, In Proceed<strong>in</strong>gs of the 37th<br />
Hawaii International Conference on System Sience (HICCS’04), IEEE<br />
Computer Society, 2004.<br />
[5] “The Gnutella Protocol Specification v.0.4.”<br />
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf, 2001.<br />
[6] Clarke I., Sandberg O., Wiley B., and Hong T., “<strong>Free</strong>net: A Distributed<br />
Anonymous Information Storage and Retrieval System”, Lecture Notes<br />
<strong>in</strong> Computer Science, 2001, 2009:46-66.<br />
[7] Ripeanu, M., Foster, I., and Iamnitchi, A., “Mapp<strong>in</strong>g the Gnutella<br />
network: Properties of large-scale peer-to-peer systems and implications<br />
for system design”, IEEE Internet Comput<strong>in</strong>g 6, 2002, 5057.<br />
[8] Newman, M. E. J., “The structure and function of complex networks”,<br />
SIAM Review 45, 2003, 167–256.<br />
[9] Newman, M.E.J., Strogatz, S. H., and Watts, D. J., “Random Graphs<br />
with arbitrary degree distributions and their applications”, Phys. Rev. E<br />
64, 2001, 026118.<br />
[10] Lua, E.-K., Crowcroft, J., Pias, M., Sharma, R., and Lim, S., “A Survey<br />
and Comparison of Peer-to-Peer Overlay Network Schemes”, IEEE<br />
Communications Surveys and Tutorials 7, 2005, 72–93.<br />
[11] Yang, B. And Garcia-Mol<strong>in</strong>a, H., “Improv<strong>in</strong>g search <strong>in</strong> peer-to-peer<br />
networks”, In Proceed<strong>in</strong>gs of the 22 nd International Conference on<br />
Distributed Comput<strong>in</strong>g Systems (ICDCS’ 02), Vienna, Autria.<br />
[12] Kle<strong>in</strong>berg J., “The Small-World phenomenon and decentralized search”,<br />
SIAM News 37(3), 2004.<br />
[13] Kle<strong>in</strong>berg J. and Raghavan P., “Query Incentive <strong>Networks</strong>”, In<br />
Proceed<strong>in</strong>gs of the 46th IEEE Symposium Foundations of Computer<br />
Science, IEEE Comput. Soc. Press, LosAlamitos.<br />
[14] Clauset A. And Moore C., “How do networks become navigable”,<br />
Prepr<strong>in</strong>t, 2003;arxiv.org,cond-mat/0309415.<br />
781