Characteristics of a P2P Network - About

w5.cs.uni.sb.de

Characteristics of a P2P Network - About

A Survey of Peer-2-Peer Applications and its

Relation to Semantic Web Technologies

Vorlesung

Semantic Web Ontologies

for Mobile Internet Agents

03.02.2005

© C. Endres / W. Wahlster 2005


What is Peer-2-Peer? (1)

“A peer-to-peer (or P2P) computer network is

any network that does not rely on dedicated

servers for communication but instead mostly

uses direct connections between clients

(peers). A pure peer-to-peer network does not

have the notion of clients or servers, but only

equal peer nodes that simultaneously function

as both "clients" and "servers" to the other

nodes on the network. ”

(WikiPedia.org)

© C. Endres / W. Wahlster 2005


What is Peer-2-Peer? (2)

“The term “peer-to-peer” (P2P) refers to a

class of systems and applications that employ

distributed resources to perform a critical

function in a decentralized manner.”

(HP Laboratories)

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network

• Decentralization

• Scalability

• Anonymity

• Self-Organization

• Shared Cost of Ownership

• Ad-Hoc Connectivity

• Performance

• Security

• Fault Resilience

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Decentralization

• Emphasis on the user’s ownership and control

of data and resources

• But: No global view on all the peers in the

network or the resources they provide

• In fully decentralized systems, finding the

network is already a challenge.

• Most P2P systems are not fully decentralized,

but hybrid.

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Decentralization

Degrees of Decentralization

• Pure (all nodes are the same)

Examples: Gnutella, Freenet

• Super-Peers (masters)

Examples: KaZaA, JXTA

• Dedicated servers (hybrid)

Napster, SETI@Home

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Scalability

• In the hybrid approach limited by / depending

on the amount of centralized operations.

Example: Napster scaled up to 6 million users.

• In a fully decentralized approach limited by /

depending on the communication overhead and

the communication model. Example:

SETI@home has currently 5.3 million users.

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Anonymity

• Allowing users to use a network without

personal consequences.

• Guaranteeing that censorship of digital content

is not possible. Example: Free Haven, Publius.

• Three kinds of anonymity: sender anonymity,

receiver anonymity, and mutual anonymity

[Pfitzmann 1987]

• Different techniques used to ensure anonymity:

Multicasting, covert paths, identity spoofing,

etc. (=> cryptography lecture)

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Self Organization

• “a process where the organization (constraint,

redundancy) of a system spontaneously

increases, i.e., without this increase being

controlled by the environment or an

encompassing or otherwise external system”

[Heylighen 1997].

P2P systems can scale unpredictably in terms

of the number of systems, number of users,

and load.

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Cost of Ownership

P2P reduces the cost of owning the system and

the content as well as the cost of maintaining

them.

• “SETI@home is faster than the fastest

supercomputer in the world, yet at only a

fraction of its cost – 1%” [Anderson 2000].

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Ad-Hoc Connectivity

• In a P2P network, some peers are available all

of the time, some are available most of the

time, and some are not available.

• An application in a P2P network needs to be

aware of the ad-hoc nature of the network and

be able to handle joining and withdrawing

peers.

• Withdrawing peers are not necessarily able to

sign off from the network.

• This is exceptional in traditional distributed

systems and normal in P2P.

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Performance

P2P networks improve performance by:

• aggregating distributed storage capacity

(Napster, Gnutella)

• Aggregating computing cycles (e.g.

SETI@home’s current computing power is

approx. 920 CPU years per day).

• Performance problems like communication

overhead (e.g. Gnutella) can be addressed by

replication, caching, and intelligent routing (e.g.

DNS).

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Security

P2P networks use same well understood

techniques as other distributed applications:

• Multi-key encryption

• Sandboxing

• Digital Rights Management

• Reputation and Accountability

• Firewalls

© C. Endres / W. Wahlster 2005


Characteristics of a P2P Network: Fault Resilience

• Problem: How to perform collaborative

calculations with potentially disconnecting

peers?

• Addressed by Genome@HOME (2001)

• Distributed calculation to understand the human

genome, initialized by Stanford Univesity

© C. Endres / W. Wahlster 2005


History of P2P Networks

• Originally, most distributed applications were

P2P

• Client/Server computing became more popular

in the late 1980ies

• Early P2P applications: SMTP, Usenet News

• Popular developments before filesharing: IRC

and instant messaging

© C. Endres / W. Wahlster 2005


History of P2P Networks

© C. Endres / W. Wahlster 2005


History of P2P Networks: Napster

• Launched in fall 1999 by Shawn Fanning

• Intended as a search engine for finding mp3

music files on the internet

• Hybrid system using dedicated index servers

• Attempted legal action against Napster in

1999 increased its popularity

• Big media coverage of leaked (pre-release)

Madonna-Song in 2000

• Only(!) promotional help for making

Radiohead album Top 1 in the U.S.

• Peak in February 2001 with 13.6 million users

© C. Endres / W. Wahlster 2005


History of P2P Networks: Napster Lawsuit

• July 2001: Napster’s servers are shut down

• September 2001: Lawsuit partially settled ($26

million settlement)

• Spring 2002: Napster 3.0 alpha is ready to

deploy. Despite the use of fingerprinting

technology, there are license problems.

• May 2002: Bertelsmann AG attempts to buy

Napster for $8 million.

• September 2002: Deal blocked because

Napster had filed for bancrupcy.

© C. Endres / W. Wahlster 2005


History of P2P Networks: Napster

• Follow-up services like KaZaA don’t rely on

dedicated servers and make a less easy legal

target

• The music industry (RIAA) consequently sues

individual users of those services starting

September 2003.

• “Napster 2.0” is launched in October 2003 as

a legal service for purchasing music over P2P.

• Other legal music purchase service evolve,

but are mainly not P2P based at the moment.

© C. Endres / W. Wahlster 2005


A closer look at some P2P systems / platforms

• Napster

• Gnutella

• FastTrack / KaZaA

• Edonkey (Emule, etc.)

• BitTorrent

• JXTA

• Edutella

• Publius

© C. Endres / W. Wahlster 2005


P2P systems: Napster

• First, very simple attempt of a file sharing

application

• Limited to one document type

• Central server for peer discovery and lookup

• Direct download connection between to users.

• No multi-source functionality

• Server crash can bring down the whole

network.

© C. Endres / W. Wahlster 2005


P2P systems: Gnutella

• Pure P2P model

• Complicated communication model causes a

lot of overhead.

• No guarantee for searcher to find an existing

file.

• No guarantee for the provider to hear all

requests.

• Communication over http-protocol.

• UDP introduced in Gnutella2

© C. Endres / W. Wahlster 2005


P2P systems: FastTrack network

• Usually referred to by the name of its first and

most popular client KaZaA

• Achieved bad reputation for installing spy

ware on the user’s computer

• Protocol was not documented but reverse

engineered.

• Other clients: Grokster, iMesh, Morpheus

• Based on Gnutella2, but introduces the

concept of “super nodes”

• Uses fast but unreliable UUHash algorithm for

file identification

© C. Endres / W. Wahlster 2005


P2P systems: eD2K network (eDonkey2000)

• Uses MD4 encryption for file identification

• Introduces the feature of sharing file segments

before download of the file is completed

• Based on a decentralized server network

• Popular clients: eMule, MLDonkey, Shareaza,

MediaVAMP

© C. Endres / W. Wahlster 2005


P2P systems: BitTorrent

• Implemented in 2001 by Bram Cohen

• Intended to simplify the distribution of large

files, especially Linux distributions

• Downloaders specialize on one file at a time

and build “swarms”

• “Torrent” (small description file) distributed

over webpages or by email.

• Contains file name, hashcode, file size, and

tracker location

• Tracker has full control over transactions

© C. Endres / W. Wahlster 2005


P2P systems: BitTorrent

© C. Endres / W. Wahlster 2005


P2P systems: BitTorrent

• Files are distributed in small chunks (16KB)

• Distribution is very fast

• No “leeching” possible

• Communication with tracker is very limited;

boundaries of scalability were never reached

• Known location of the tracker makes trading of

illegal content less likely (i.e. more difficult)

• Newer developments based on BitTorrent use

decentralization, web seeding and

“broadcatching”

© C. Endres / W. Wahlster 2005


P2P systems: JXTA

• Released by Sun Microsystems in 2001

• “General purpose” infrastructure for network

programming.

• Goals: interoperability, platform independence,

ubiquity.

• Introducing the concept of peer groups

(partitioning of peers), peer pipes

(asynchronous unidirectional communication

channels), and peer monitors.

• All data interchange is in XML format.

• Open source

© C. Endres / W. Wahlster 2005


P2P systems: Edutella

• Based on JXTA

• RDF-based metadata structure for P2P

applications

• Vision: To enable interoperability between

heterogeneous JXTA applications

• Services provided: Query, replication,

mapping, and annotation.

• Application: Exchange of educational

resources between universities

© C. Endres / W. Wahlster 2005


P2P systems: Publius

• Censorship Resistant Publishing System

• Developed at NYU

• Publisher stays anonymous

• Published documents are encrypted and

distributed in shares => no tampering or

censorship possible.

• Individual server has no idea about the

documents content.

• Open source.

© C. Endres / W. Wahlster 2005


Design issues for building P2P networks

• Peer discovery

• Querying peers

• File identification

• Ensuring file integrity

• Multi-source download

• Firewalls

• User interface design

© C. Endres / W. Wahlster 2005


Peer discovery

• Essential feature in P2P applications

• In pure P2P networks: hard-coded addresses

or broadcast to local network, then retrieval of

other peers by querying known peers.

Completely independent on any server but

connecting to the network might be difficult or

impossible

• In hybrid networks: query to index server. Very

fast way of connecting to the network, but

introducing a single point of failure.

© C. Endres / W. Wahlster 2005


Querying peers

• Pure P2P: Queries are passed on between

peers. Causing communication overhead.

• Hybrid system with discovery server:

Obtaining peers from discovery server and

querying them.

• Hybrid system with discovery and lookup

server: querying the lookup server for peers

hosting the requested file.

© C. Endres / W. Wahlster 2005


File identification

• Problem: Identical files may be named

differently

• Problem: Identically named files may not be

binary identical

• Fast hashing algorithm is needed to identify

files

• UUHash, MD4 and MD5 have already been

successfully exploited

• SHA1 seems to be promising, but is a bit

slower

© C. Endres / W. Wahlster 2005


Ensuring file integrity

• Files might get corrupted during transfer

• Corrupt files need to be identified early,

preventing them from spreading over the net

• It is usually possible to fix the corrupt portion

of a file without replacing the whole file

• Peers need to communicate hash codes for

file segments for integrity checks

• Corrupt segments can be marked as not

shareable and replaced.

© C. Endres / W. Wahlster 2005


Multi-source download

• Download from one single source is

problematic when the uploading peer

disconnects (Napster 1.0)

• With asynchronous internet connection (e.g.

DSL), it is difficult to fill the download

bandwidth

© C. Endres / W. Wahlster 2005


Firewalls

• Problem: a firewalled computer can initiate a

connection but not accept connection attempts

from outside

• For a firewalled computer, downloads are not

a problem, but uploads have to be

communicated over a peer.

• For two firewalled computers, it is impossible

to establish a direct connection.

© C. Endres / W. Wahlster 2005


Design of a user interface

• File sharing applications tend to be very

complex

• A lot of technical background is expected from

the user to get all the settings right

• For a layman user, it might be not possible to

understand all the important details.

• => User interface design is crucial

• Example: eMule 0.42f

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Search

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Transfer

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Shared Files

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Statistics

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Servers

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Kademlia (serverless search)

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: Chat

© C. Endres / W. Wahlster 2005


Emule 0.42f User Interface: IRC

© C. Endres / W. Wahlster 2005


Attacks on P2P networks

Many P2P networks are under constant attack:

• Poisoning attack

• DoS attacks

• Defection attacks

• Insertion of viruses in content

• Malware / Spyware in the client itself

• Filtering

• Identity attacks

• Spamming (not necessarily intended as DoS)

© C. Endres / W. Wahlster 2005


P2P and semantic web???

• Files swapped in P2P applications are poorly

annotated, usually by filename only.

• Sophisticated search is usually not possible.

• Even web based search engines for P2P

content rarely provide any more information

than filename and file size.

• Complex dependencies (as for instance

needed/used in edutella) can not be

appropriately modeled with normal file sharing

applications

• Rating and feedback usually done externally

© C. Endres / W. Wahlster 2005


P2P and semantic web!

• Schlosser, Sintek, Decker, Nejdel: “A Scalable

Ontology-Based P2P Infrastructure for

Semantic Web Services”

• Combination of Semantic Web and Web

Services with semantic mark-up

• Proposing a stable graph topology

(hypercube) without servers or super nodes,

that is very useful for broadcast and search.

• Very technical / mathematical paper but worth

taking a look at.

© C. Endres / W. Wahlster 2005


Requirements for commercial filesharing

• Secure / reliable transaction monitoring

•Privacy

• Efficient (e.g. fast) transfer protocols

• High availability

• Simple and intuitive user interface

• Broad selection

• Preview / Prelistening option

• Plain media files (e.g. not bothering the user

with DRM etc. after a purchase)

• Acceptable pricing

© C. Endres / W. Wahlster 2005


Bringing the P2P concept in the “real world”

• Tendency to adapt established software

concepts beyond their original scope.

• Example: Open source concept, intended for

software development, is now successfully

used for wikipedia.

• Matchmaking services have been established

for a while now on the internet, e.g. ebay.

• “Real” peer-to-peer concept adaptation since

September 2004: www.buchticket.de

© C. Endres / W. Wahlster 2005


Bringing the P2P concept in the “real world”

© C. Endres / W. Wahlster 2005


Conclusion

• The trend from the late 80ies to move from

P2P networks to client/server networks has

been partially reversed.

• Mainly illegal file sharing applications proofed

that P2P is a powerful and highly scalable

technology

• Especially when dealing with increasingly

larger media or program files, P2P has a big

advantage over client/server architecture in

terms of performance and stability

© C. Endres / W. Wahlster 2005


Outlook (Speculation)

• With increasing computing and

communication capabilities, applications

similar to the ones that ran in the late 90ies on

PCs might soon be on cell phones and mobile

devices

• Inclusion of semantic web technologies will

facilitate the handling of large P2P based

storage systems

• Digital distribution of large files (e.g. DVD-size

movies) will have to take place in P2P

networks to avoid server bottlenecks.

© C. Endres / W. Wahlster 2005


Questions / Discussion

© C. Endres / W. Wahlster 2005

More magazines by this user
Similar magazines