01.01.2015 Views

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

Proceedings [PDF] - Measurement and Analysis of P2P Activity ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Conference Advances in the <strong>Analysis</strong> <strong>of</strong> Online Paedophile <strong>Activity</strong> Paris, France; 2-3 June, 2009<br />

100000<br />

0.0018<br />

0.0016<br />

Number <strong>of</strong> queries<br />

10000<br />

1000<br />

100<br />

Paedophile IPs / normal IPs ratio<br />

0.0014<br />

0.0012<br />

0.001<br />

0.0008<br />

0.0006<br />

10<br />

0.0004<br />

1<br />

1 2 3 4 5 6 7 8 9<br />

Number <strong>of</strong> paedophile keywords<br />

0.0002<br />

0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06<br />

Elapsed time (seconds)<br />

Figure 2: Distribution <strong>of</strong> the number <strong>of</strong> paedophile<br />

keywords in queries, i.e. for each number <strong>of</strong><br />

paedophile keywords from our set contained in queries,<br />

the number <strong>of</strong> queries (vertical axis), with logarithmic<br />

scale.<br />

Number <strong>of</strong> keywords in queries (Fig. 2)<br />

The distribution shows that, among queries containing<br />

at least one paedophile keyword, most <strong>of</strong> them contains<br />

a single keyword. These results raise the question <strong>of</strong><br />

the correct definition for a paedophile query. Our first<br />

model (PQ) might be improved to eliminate queries<br />

with more than x paedophile keywords.<br />

The table below shows the number <strong>of</strong> paedophile queries,<br />

for two definitions <strong>of</strong> a paedophile query. In row A,<br />

the number <strong>of</strong> queries containing exactly x paedophile<br />

keywords. In row B, the number <strong>of</strong> queries containing<br />

at least x paedophile keywords.<br />

1 2 3 4 5 6 7 8 9<br />

A 110614 3166 1105 380 78 100 26 13 1<br />

B 115483 4869 1703 598 218 140 40 14 1<br />

Paedophile IPs / normal IPs ratio (Fig. 3)<br />

If we now consider NAT or dynamic IP allocation, an IP<br />

may remain paedophile until the end <strong>of</strong> the ten weeks,<br />

even if it is not the same user anymore. This kind <strong>of</strong><br />

contamination may lead to eventually consider every<br />

IP as paedophile. The growing ratio (see Fig. 3) shows<br />

that, even by the end <strong>of</strong> the experiment, we still discover<br />

many new paedophile IPs in queries.<br />

Total number <strong>of</strong> queries by keyword (Fig. 4)<br />

One may notice several groups <strong>of</strong> keywords, sorted by<br />

frequency <strong>of</strong> appearance. But this plot also shows that a<br />

specific keyword (the right most one) is used in slightly<br />

less than 60% <strong>of</strong> all the paedophile queries. The gap<br />

between this paedophile word <strong>and</strong> others suggests that<br />

it may be a not-so-specific keyword. Further investigation,<br />

such as combination <strong>of</strong> this keyword with others,<br />

Figure 3: Time-evolution <strong>of</strong> the ratio <strong>of</strong> paedophile<br />

IPs over normal IPs.<br />

Percentage <strong>of</strong> the pedophile queries<br />

100<br />

10<br />

1<br />

0.1<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21<br />

Number <strong>of</strong> the keyword in the set<br />

Figure 4: Number <strong>of</strong> queries by paedophile keyword.<br />

must be carried out to decide whether all queries containing<br />

this keyword should be considered paedophile.<br />

4. FUTURE WORK<br />

Most <strong>of</strong> the future work will consist in studying combination<br />

<strong>of</strong> keywords, so as to corroborate our hypothesis<br />

on the model (specific paedophile keywords). A<br />

deeper work on ages will also be carried out. We will<br />

progressively refine our definition <strong>of</strong> paedophile queries<br />

<strong>and</strong> IP, before establishing accurate <strong>and</strong> reliable results.<br />

5. REFERENCES<br />

[1] F. Aidouni, M. Latapy, <strong>and</strong> C. Magnien, “Ten<br />

weeks in the life <strong>of</strong> an edonkey server,” <strong>Proceedings</strong><br />

<strong>of</strong> Hot<strong>P2P</strong>’09, 2009.<br />

[2] M. Latapy, C. Magnien, <strong>and</strong> G. Valadon,<br />

“First report on database specification <strong>and</strong> access<br />

including content rating <strong>and</strong> fake detection<br />

system,” http://antipedo.lip6.fr, 2008.<br />

2<br />

98

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!