02.06.2013 Views

P2P Filesharing Population Tracking Based on ... - ETH Zürich

P2P Filesharing Population Tracking Based on ... - ETH Zürich

P2P Filesharing Population Tracking Based on ... - ETH Zürich

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> <str<strong>on</strong>g>Filesharing</str<strong>on</strong>g> <str<strong>on</strong>g>Populati<strong>on</strong></str<strong>on</strong>g><br />

<str<strong>on</strong>g>Tracking</str<strong>on</strong>g><br />

<str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> Network Flow Data<br />

Arno Wagner, Thomas Dübendorfer, Lukas Hämmerle, Bernhard Plattner<br />

C<strong>on</strong>tact: arno@wagner.name<br />

Communicati<strong>on</strong> Systems Laboratory<br />

Swiss Federal Institute of Technology Zurich (<strong>ETH</strong> Zurich)


Outline<br />

1. Motivati<strong>on</strong><br />

2. Setting<br />

3. The PeerTracker<br />

4. Validati<strong>on</strong> by Polling<br />

5. Comments <strong>on</strong> Legal Aspects<br />

6. C<strong>on</strong>clusi<strong>on</strong><br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.1


C<strong>on</strong>tributors<br />

Philipp Jardas, ”<str<strong>on</strong>g>P2P</str<strong>on</strong>g> <str<strong>on</strong>g>Filesharing</str<strong>on</strong>g> Systems: Real<br />

World NetFlow Traffic Characterizati<strong>on</strong>”, Barchelor<br />

Thesis, <strong>ETH</strong> <strong>Zürich</strong>, 2004<br />

Lukas Hämmerle, ”<str<strong>on</strong>g>P2P</str<strong>on</strong>g> <str<strong>on</strong>g>Populati<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Tracking</str<strong>on</strong>g> and<br />

Traffic Characterizati<strong>on</strong> of Current <str<strong>on</strong>g>P2P</str<strong>on</strong>g> file-sharing<br />

Systems”, Master Thesis, <strong>ETH</strong> <strong>Zürich</strong>, 2004<br />

Roger Kaspar, ”<str<strong>on</strong>g>P2P</str<strong>on</strong>g> File-sharing Traffic Identificati<strong>on</strong><br />

Method Validati<strong>on</strong> and Verificati<strong>on</strong>”, Semester<br />

Thesis, <strong>ETH</strong> <strong>Zürich</strong>, 2005<br />

PDFs available from<br />

http://www.tik.ee.ethz.ch/~ddosvax/sada/<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.2


Motivati<strong>on</strong><br />

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> traffic forms a large and dynamic part of the<br />

overall network traffic<br />

Identificati<strong>on</strong> of <str<strong>on</strong>g>P2P</str<strong>on</strong>g> traffic allows <str<strong>on</strong>g>P2P</str<strong>on</strong>g> anomaly<br />

detecti<strong>on</strong><br />

Identificati<strong>on</strong> of <str<strong>on</strong>g>P2P</str<strong>on</strong>g> traffic allows better analysis of<br />

other traffic<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.3


The DDoSVax Project<br />

http://www.tik.ee.ethz.ch/~ddosvax/<br />

Collaborati<strong>on</strong> between SWITCH (www.switch.ch,<br />

AS559) and <strong>ETH</strong> Zurich (www.ethz.ch)<br />

Aim (l<strong>on</strong>g-term): Near real-time analysis and<br />

countermeasures for DDoS-Attacks and Internet<br />

Worms<br />

Start: Begin of 2003<br />

Funded by SWITCH and the Swiss Nati<strong>on</strong>al Science<br />

Foundati<strong>on</strong><br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.4


DDoSVax Data Source: SWITCH<br />

The Swiss Academic And Research Network<br />

.ch Registrar<br />

Links most Swiss Universities and CERN<br />

Carried around 5% of all Swiss Internet traffic in 2003<br />

Around 60.000.000 flows/hour<br />

Around 300GB traffic/hour<br />

Flow archive (unsampled) since May 2003<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.5


What is a ”Flow”?<br />

In the DDoSVax-Project: CISCO NetFlow v5.<br />

Basically unidirecti<strong>on</strong>al aggregated header informati<strong>on</strong>:<br />

Source and destinati<strong>on</strong> IP<br />

Protocol<br />

Source and destinati<strong>on</strong> port (TCP and UDP)<br />

Packet and byte count<br />

Start and end time stamp, 1ms accuracy<br />

Some routing informati<strong>on</strong><br />

No payload informati<strong>on</strong> whatsoever<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.6


<str<strong>on</strong>g>P2P</str<strong>on</strong>g> Networks C<strong>on</strong>sidered<br />

Signalling<br />

c<strong>on</strong>necti<strong>on</strong><br />

File−transfer<br />

c<strong>on</strong>necti<strong>on</strong><br />

Client<br />

Server/<br />

Supernode/<br />

Tracker<br />

FastTrack/iMesh<br />

KaZaA<br />

iMesh<br />

Grokster<br />

...<br />

BitTorrent<br />

Azureus<br />

...<br />

BitTorrent<br />

Gnutella<br />

mlD<strong>on</strong>key<br />

Kademlia<br />

LimeWire<br />

Bearshare<br />

Shareaza<br />

... Overnet<br />

Overnet<br />

eMule<br />

eD<strong>on</strong>key<br />

eD<strong>on</strong>key2000<br />

...<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.7


PeerTracker Algorithm<br />

Idea: ”Traverse” <str<strong>on</strong>g>P2P</str<strong>on</strong>g> network from seeds<br />

Seeds:<br />

Seeds are peers using default ports (TCP and UDP)<br />

Keep a pool of peers for each network<br />

Add hosts that communicate with the pool<br />

Remove hosts that are idle<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.8


Default Port Usage<br />

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> System Default port usage (TCP)<br />

BitTorrent 70.0 %<br />

FastTrack 8.3 %<br />

Gnutella 58.6 %<br />

eD<strong>on</strong>key 55.6 %<br />

Overnet 83.9 %<br />

Kademlia 66.6 %<br />

From 8 day PeerTracker measurement end of 2004<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.9


PeerTracker: Hosts State Diagram<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.10


Some PeerTracker Details<br />

Peers in the SWITCH network form ”core”<br />

External hosts <strong>on</strong>ly identified if they c<strong>on</strong>tact core<br />

Different ageing for core and external hosts<br />

Dead core peers are still c<strong>on</strong>tacted from external<br />

peers<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.11


Validati<strong>on</strong><br />

The PeerTracker does not look at Payloads<br />

⇒ Large error possible<br />

Validati<strong>on</strong> Approach:<br />

Run PeerTracker<br />

Poll found peers immediately<br />

Compare polling and tracker results<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.12


Peer Polling Methods<br />

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> System Polling method<br />

FastTrack Request: GET /.files HTTP/1.0<br />

Resp<strong>on</strong>se: HTTP 1.0 403 Forbidden <br />

or HTTP/1.0 404 Not Found/nX-Kazaa-<br />

Gnutella Request: GNUTELLA CONNECT/<br />

Resp<strong>on</strong>se: Gnutella <br />

eD<strong>on</strong>key, Request: Binary: 0xE3 0x01 0x10 <br />

Kademlia, Resp<strong>on</strong>se: Binary: 0xE3 . . .<br />

Overnet<br />

eMule Same as eD<strong>on</strong>key, but replace initial byte with 0xC5.<br />

BitTorrent Unsolved. Seems to need knowledge of a shared file <strong>on</strong> the target peer.<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.13


Polling Results<br />

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> System TCP C<strong>on</strong>nect <str<strong>on</strong>g>P2P</str<strong>on</strong>g>-client found<br />

eD<strong>on</strong>key,<br />

Kademlia, 50% 41%<br />

Overnet<br />

Gnutella 53% 30%<br />

FastTrack 51% 41%<br />

Total 51% 38%<br />

Table 1: Positive polling answers<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.14


Polling Remarks<br />

Delay to polling < 10 Minutes<br />

About 50% unreachable via TCP and port seen by<br />

PeerTracker⇒likely behind NAT<br />

Other errors: Peer variati<strong>on</strong> (esp. Gnutella),<br />

classificati<strong>on</strong> into wr<strong>on</strong>g network, tracker error<br />

BitTorrent not really pollable<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.15


Legal Aspects<br />

(Warning: I am no legal expert)<br />

Flow data likely not subject to privacy laws<br />

(unless attempts to identify people are made)<br />

PeerTracker does not identify c<strong>on</strong>tent shared<br />

⇒ output unproblematic, no acti<strong>on</strong> needs to be taken<br />

Identificati<strong>on</strong> of heavy hitters unproblematic, since<br />

payloads (shared c<strong>on</strong>tents) not identified<br />

Polling is unproblematic, since similar to running a<br />

peer (some users get nervous though...)<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.16


PeerTracker for Law Enforcement<br />

Situati<strong>on</strong> for CH!<br />

Massive private file-sharing is d<strong>on</strong>e<br />

Law enforcement is not really interested in copyright<br />

infringement<br />

Illegal c<strong>on</strong>tents (child pornography and the like) can<br />

be d<strong>on</strong>e far better with modified <str<strong>on</strong>g>P2P</str<strong>on</strong>g> clients<br />

⇒ Not really suitable<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.17


C<strong>on</strong>clusi<strong>on</strong>, Remarks<br />

Peer identificati<strong>on</strong> feasible with flow data<br />

Nodes with little traffic problematic<br />

BitTorrent problematic<br />

<str<strong>on</strong>g>P2P</str<strong>on</strong>g> filesharing still evolves<br />

PeerTracker code is available under GPL<br />

In productive use at <strong>ETH</strong> Zurich since 2005<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.18


Thank You!<br />

Arno Wagner, <strong>ETH</strong> Zurich, ICISP 2006 – p.19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!