Proceedings [PDF] - Measurement and Analysis of P2P Activity ...
Proceedings [PDF] - Measurement and Analysis of P2P Activity ...
Proceedings [PDF] - Measurement and Analysis of P2P Activity ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
International Conference Advances in the <strong>Analysis</strong> <strong>of</strong> Online Paedophile <strong>Activity</strong> Paris, France; 2-3 June, 2009<br />
storage 2 . The resulting data set occupies 500 gigabytes<br />
in total. It contains 10 millions queries, 90 millions <strong>of</strong><br />
IP addresses <strong>and</strong> 280 millions <strong>of</strong> distinct files.<br />
This measurement as the benefit <strong>of</strong> capturing all <strong>of</strong><br />
the data exchanged by the server, however it fails at collecting<br />
clients to clients communications. Hence, we are<br />
not able to study the file exchanges that occur between<br />
clients.<br />
3.2 Client measurements<br />
There are two different client based measurements.<br />
The first one, Honeypot, aims at collecting information<br />
about file exchanges between clients, whereas the<br />
second one performs queries using specific keywords to<br />
monitor files available in the eDonkey network. They<br />
are conjointly used to circumvent the limitations <strong>of</strong> the<br />
server measurements.<br />
3.2.1 Honeypot<br />
We developed an eDonkey honeypot : a client that (1)<br />
informs the server that it is willing to exchange a predefined<br />
list <strong>of</strong> files, <strong>and</strong> (2) that allows connection from<br />
other clients, but will either send r<strong>and</strong>om content or no<br />
content at all. Moreover, the honeypot retrieves the list<br />
<strong>of</strong> files shared by clients contacting it. Consequently, it<br />
is now possible to measure the paedophile activity concerning<br />
client to client communication, <strong>and</strong> to precisely<br />
identify the interest <strong>of</strong> users concerning specific files.<br />
Using this honeypot, we conducted an active measurement<br />
[4] during 32 days; 24 distributed honeypots<br />
were advertising 4 files. The resulting data set contains<br />
110 049 IP addresses <strong>and</strong> 28 007 distinct files.<br />
Two important facts were enlighten by this measurement<br />
: (1) long <strong>and</strong> distributed measurements are relevant<br />
<strong>and</strong> allow to discover more peers <strong>and</strong> files; (2)<br />
with the r<strong>and</strong>om content strategy, more peers contact<br />
the honeypots than with the no content one.<br />
3.2.2 Client sending queries<br />
Here, we use an eDonkey client that exactly act as<br />
a regular client would. It periodically queries eDonkey<br />
servers using a list <strong>of</strong> predefined paedophile <strong>and</strong> nonpaedophile<br />
keywords. The goal <strong>of</strong> this measurement is<br />
therefore to enumerate the files that are globally available<br />
in the eDonkey network, <strong>and</strong> to be able to detect<br />
when files appear.<br />
This measurement was conducted during 140 days<br />
from October 2008 to February 2009. We observed 2<br />
978 764 distinct IP addresses <strong>and</strong> 2 784 583 distinct<br />
files.<br />
While simple, this experiment shows that it is really<br />
difficult to completely discover all <strong>of</strong> the files available<br />
in eDonkey : long measurements discover new files with<br />
2 the data set is available at http://content.lip6.fr/<br />
latapy/edonkey/weeks/<br />
Percentage <strong>of</strong> filenames<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
Client measurements<br />
Server measurements<br />
0<br />
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20<br />
Age in filenames<br />
Figure 1: Distribution <strong>of</strong> ages seen in filenames.<br />
The y-axis indicates the percentage <strong>of</strong> filenames (containing<br />
ages) including the corresponding age on the<br />
x-axis.<br />
an important growth rate.<br />
Example: Ages in filenames<br />
A unique format was used to store the data from the<br />
three different measurements. It is therefore straightforward<br />
to analyse the resulting data sets using the same<br />
techniques. Figure 1 presents the distribution <strong>of</strong> filenames<br />
containing information about ages 3 for a given<br />
age.<br />
Whereas we consider two different data sets, a striking<br />
result visually arises concerning ages : there is a<br />
clear interest for 18 years old, <strong>and</strong> another one around<br />
12 to 13 years old. We are currently investigating the<br />
spike at 9 years old on the client measurement in order<br />
to check whether this is a measurement artifact : the<br />
server measurement contains much more filenames than<br />
the client one, <strong>and</strong> was performed two years ago.<br />
4. REFERENCES<br />
[1] Y. Kulbak <strong>and</strong> D. Bickson, “The eMule Protocol<br />
Specification,” 2005,<br />
http://citeseer.ist.psu.edu/kulbak05emule.html.<br />
[2] United States General Accounting Office, “File<br />
sharing programs : Child Pornography Is Readily<br />
Accessible over Peer-to-Peer Networks,” 2003.<br />
[3] F. Aidouni, M. Latapy, <strong>and</strong> C. Magnien, “Ten<br />
weeks in the life <strong>of</strong> an eDonkey server,” in Sixth<br />
International Workshop on Hot Topics in<br />
Peer-to-Peer Systems (Hot-<strong>P2P</strong> 2009), Rome,<br />
Italy, May 2009.<br />
[4] O. Allali, M. Latapy, <strong>and</strong> C. Magnien,<br />
“<strong>Measurement</strong> <strong>of</strong> eDonkey <strong>Activity</strong> with<br />
Distributed Honeypots,” in Sixth International<br />
Workshop on Hot Topics in Peer-to-Peer Systems<br />
(Hot-<strong>P2P</strong> 2009), Rome, Italy, May 2009.<br />
3 such as 12 years old is encoded as 12yo or 12yr.<br />
2<br />
8