01.01.2015 Views

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Conference Advances in the <strong>Analysis</strong> <strong>of</strong> Online Paedophile <strong>Activity</strong> Paris, France; 2-3 June, 2009<br />

storage 2 . The resulting data set occupies 500 gigabytes<br />

in total. It contains 10 millions queries, 90 millions <strong>of</strong><br />

IP addresses <strong>and</strong> 280 millions <strong>of</strong> distinct files.<br />

This measurement as the benefit <strong>of</strong> capturing all <strong>of</strong><br />

the data exchanged by the server, however it fails at collecting<br />

clients to clients communications. Hence, we are<br />

not able to study the file exchanges that occur between<br />

clients.<br />

3.2 Client measurements<br />

There are two different client based measurements.<br />

The first one, Honeypot, aims at collecting information<br />

about file exchanges between clients, whereas the<br />

second one performs queries using specific keywords to<br />

monitor files available in the eDonkey network. They<br />

are conjointly used to circumvent the limitations <strong>of</strong> the<br />

server measurements.<br />

3.2.1 Honeypot<br />

We developed an eDonkey honeypot : a client that (1)<br />

informs the server that it is willing to exchange a predefined<br />

list <strong>of</strong> files, <strong>and</strong> (2) that allows connection from<br />

other clients, but will either send r<strong>and</strong>om content or no<br />

content at all. Moreover, the honeypot retrieves the list<br />

<strong>of</strong> files shared by clients contacting it. Consequently, it<br />

is now possible to measure the paedophile activity concerning<br />

client to client communication, <strong>and</strong> to precisely<br />

identify the interest <strong>of</strong> users concerning specific files.<br />

Using this honeypot, we conducted an active measurement<br />

[4] during 32 days; 24 distributed honeypots<br />

were advertising 4 files. The resulting data set contains<br />

110 049 IP addresses <strong>and</strong> 28 007 distinct files.<br />

Two important facts were enlighten by this measurement<br />

: (1) long <strong>and</strong> distributed measurements are relevant<br />

<strong>and</strong> allow to discover more peers <strong>and</strong> files; (2)<br />

with the r<strong>and</strong>om content strategy, more peers contact<br />

the honeypots than with the no content one.<br />

3.2.2 Client sending queries<br />

Here, we use an eDonkey client that exactly act as<br />

a regular client would. It periodically queries eDonkey<br />

servers using a list <strong>of</strong> predefined paedophile <strong>and</strong> nonpaedophile<br />

keywords. The goal <strong>of</strong> this measurement is<br />

therefore to enumerate the files that are globally available<br />

in the eDonkey network, <strong>and</strong> to be able to detect<br />

when files appear.<br />

This measurement was conducted during 140 days<br />

from October 2008 to February 2009. We observed 2<br />

978 764 distinct IP addresses <strong>and</strong> 2 784 583 distinct<br />

files.<br />

While simple, this experiment shows that it is really<br />

difficult to completely discover all <strong>of</strong> the files available<br />

in eDonkey : long measurements discover new files with<br />

2 the data set is available at http://content.lip6.fr/<br />

latapy/edonkey/weeks/<br />

Percentage <strong>of</strong> filenames<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

Client measurements<br />

Server measurements<br />

0<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20<br />

Age in filenames<br />

Figure 1: Distribution <strong>of</strong> ages seen in filenames.<br />

The y-axis indicates the percentage <strong>of</strong> filenames (containing<br />

ages) including the corresponding age on the<br />

x-axis.<br />

an important growth rate.<br />

Example: Ages in filenames<br />

A unique format was used to store the data from the<br />

three different measurements. It is therefore straightforward<br />

to analyse the resulting data sets using the same<br />

techniques. Figure 1 presents the distribution <strong>of</strong> filenames<br />

containing information about ages 3 for a given<br />

age.<br />

Whereas we consider two different data sets, a striking<br />

result visually arises concerning ages : there is a<br />

clear interest for 18 years old, <strong>and</strong> another one around<br />

12 to 13 years old. We are currently investigating the<br />

spike at 9 years old on the client measurement in order<br />

to check whether this is a measurement artifact : the<br />

server measurement contains much more filenames than<br />

the client one, <strong>and</strong> was performed two years ago.<br />

4. REFERENCES<br />

[1] Y. Kulbak <strong>and</strong> D. Bickson, “The eMule Protocol<br />

Specification,” 2005,<br />

http://citeseer.ist.psu.edu/kulbak05emule.html.<br />

[2] United States General Accounting Office, “File<br />

sharing programs : Child Pornography Is Readily<br />

Accessible over Peer-to-Peer Networks,” 2003.<br />

[3] F. Aidouni, M. Latapy, <strong>and</strong> C. Magnien, “Ten<br />

weeks in the life <strong>of</strong> an eDonkey server,” in Sixth<br />

International Workshop on Hot Topics in<br />

Peer-to-Peer Systems (Hot-<strong>P2P</strong> 2009), Rome,<br />

Italy, May 2009.<br />

[4] O. Allali, M. Latapy, <strong>and</strong> C. Magnien,<br />

“<strong>Measurement</strong> <strong>of</strong> eDonkey <strong>Activity</strong> with<br />

Distributed Honeypots,” in Sixth International<br />

Workshop on Hot Topics in Peer-to-Peer Systems<br />

(Hot-<strong>P2P</strong> 2009), Rome, Italy, May 2009.<br />

3 such as 12 years old is encoded as 12yo or 12yr.<br />

2<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!