01.01.2015 Views

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

International Conference Advances in the <strong>Analysis</strong> <strong>of</strong> Online Paedophile <strong>Activity</strong> Paris, France; 2-3 June, 2009<br />

Tracing paedophile eDonkey users<br />

through keyword-based queries<br />

Raphaël Fournier, Guillaume Valadon, Clémence Magnien, Matthieu Latapy<br />

LIP6 – CNRS <strong>and</strong> University Pierre & Marie Curie<br />

104, avenue du Président Kennedy, 75016 Paris, France<br />

1. MOTIVATION<br />

Recent studies showed that Internet b<strong>and</strong>with is now<br />

massively dedicated to peer-to-peer (<strong>P2P</strong>) exchanges.<br />

Thus, it is important to precisely examine this new way<br />

<strong>of</strong> transmitting data <strong>and</strong> the kind <strong>of</strong> content that is<br />

exchanged. In particular, authorities are willing to obtain<br />

reliable information on paedophile activity on these<br />

networks, in order to fight against cybercriminality 1 .<br />

One <strong>of</strong> the most prominent <strong>P2P</strong> networks is eDonkey,<br />

a semi-centralized system. Users connect to servers <strong>and</strong><br />

submit content queries, servers return lists <strong>of</strong> files, then<br />

users exchange files directly. A study [1] was designed<br />

to record, all the exchanges between an eDonkey server<br />

<strong>and</strong> the connected peers. The exchanges collected include:<br />

• keyword-based search queries submitted to the server<br />

<strong>and</strong> the server’s answer consisting in lists <strong>of</strong><br />

files;<br />

• specific file requests <strong>and</strong> server’s answers (lists <strong>of</strong><br />

users sharing the file).<br />

The experiment lasted ten weeks without interruption.<br />

There were 127 million queries submitted to the server,<br />

by 28 million IPs. As a first <strong>and</strong> rather rough definition,<br />

we consider that a user uses only one IP <strong>and</strong> that an<br />

IP is not shared by several users. The measures on<br />

paedophile activity were based on a set <strong>of</strong> 21 keywords.<br />

Their “paedophile” nature was previously assessed by<br />

co-occurence studies [2]. We consider that a query is<br />

paedophile if it contains at least one <strong>of</strong> these words – we<br />

call it model “PQ”. An IP is considered as paedophile<br />

as soon as it submits such a paedophile query.<br />

2. GOALS<br />

This study aims at establishing some accurate facts<br />

about paedophile users on the eDonkey server. Above<br />

all, counting paedophile users is our priority. Thus, the<br />

study will first require to clearly define what a paedophile<br />

user is <strong>and</strong> what makes a query a paedophile<br />

one.<br />

1 This work is supported by the European MAPAP (SIP-<br />

2006-PP-221003) <strong>and</strong> the French ANR/MAPE projects.<br />

3. RESULTS<br />

Number <strong>of</strong> queries by IP (Fig. 1)<br />

Number <strong>of</strong> IPs<br />

100000<br />

10000<br />

1000<br />

100<br />

10<br />

1<br />

1 10 100 1000<br />

Number <strong>of</strong> paedophile queries<br />

Figure 1: Distribution <strong>of</strong> the number <strong>of</strong> paedophile<br />

queries by paedophile IP, i.e. for each<br />

encountered number <strong>of</strong> paedophile queries, the number<br />

<strong>of</strong> IPs which submitted this number <strong>of</strong> queries.<br />

The distribution is highly heterogeneous: more than<br />

66% <strong>of</strong> the total paedophile IPs submit only one query<br />

– <strong>and</strong> 94% less than 5 –, while some IPs submit up<br />

to 456 queries within ten weeks. This figure raises the<br />

question <strong>of</strong> the characterization <strong>of</strong> a paedophile IP: is<br />

a single query within 10 weeks enough to consider the<br />

IP as paedophile 456 queries in the overall experimentation<br />

means more than 6 paedophile queries a day,<br />

is it the behavior <strong>of</strong> a very active human user or <strong>of</strong> a<br />

robot Our underlying assumptions on the way eDonkey<br />

clients work are also called into question here: are<br />

queries sometimes automatically re-submitted to the<br />

search engine <br />

This distribution is crucial to have a good model to<br />

count paedophile users with a low false-positive rate.<br />

1<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!