Communication Paradigms - Connect
Communication Paradigms - Connect
Communication Paradigms - Connect
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Advanced Distributed Systems<br />
<strong>Communication</strong> <strong>Paradigms</strong><br />
MSc in Advanced Computer Science<br />
Gordon Blair (gordon@comp.lancs.ac.uk)<br />
[http://research.lumeta.com/ches/map/]
Overview of the Session<br />
A problem led approach<br />
The role of communication in distributed systems<br />
Different styles of communication paradigm<br />
Interprocess communication<br />
Remote invocation<br />
Indirect communication<br />
Comparing and contrasting the approaches<br />
Associated Reading: CDK, chpts 4 and 5, pp. 609-615,<br />
677-683, chpt 18, TvS, pp. 145-157 (overall, rather<br />
fragmented coverage!)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 2
A Framework for Understanding<br />
Distributed Systems<br />
What are the entities that are communicating in the<br />
distributed system?<br />
Nodes? Processes? Objects? Components? Web services?<br />
How do they communicate or, more specifically, what<br />
communication paradigm is used?<br />
What (potentially changing) roles and responsibilities do they<br />
have in the overall architecture?<br />
Client or server? Peers?<br />
How are they mapped on to the physical distributed<br />
infrastructure (what is their placement)?<br />
Partitioning? Replication? Caching? Code mobility?<br />
Adv. Dist. Systems G. Blair/ F. Taiani 3
Context: Middleware<br />
Structures<br />
Adv. Dist. Systems G. Blair/ F. Taiani 4
The Problem: A<br />
Social Network Systems<br />
Architecture for a Social<br />
Network<br />
Adv. Dist. Systems G. Blair/ F. Taiani 5
Design Space<br />
invocation (RMI)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 6
Interprocess<br />
<strong>Communication</strong>
Sockets Programming<br />
Choice of UDP or TCP communication models<br />
Possible support for (IP) multicast<br />
Adv. Dist. Systems G. Blair/ F. Taiani 8
Other Approaches<br />
Message passing, e.g. MPI<br />
Overlay networks<br />
Adv. Dist. Systems G. Blair/ F. Taiani 9
Remote Invocation
Remote Invocation: Remote<br />
Procedure Calls
What is RPC?<br />
Higher level mechanism supporting the construction of distributed<br />
applications (see also discussion on transparency later)<br />
Supports the calling of a procedure in a separate address space<br />
(process) as if it exists in the local address space; the process may<br />
or may not be on the same machine<br />
But what is the semantics of a local procedure call?<br />
Adv. Dist. Systems G. Blair/ F. Taiani 12
Styles of RPC<br />
First class RPC<br />
Integrated into the language => normal language<br />
mechanisms can be used for exceptions etc<br />
Examples include Java RMI, Ada and Argus<br />
Second Class RPC<br />
A special Interface Definition Language (IDL) is used<br />
to define communications (see later)<br />
Language independence<br />
Examples include Sun RPC, CORBA and DCE<br />
Adv. Dist. Systems G. Blair/ F. Taiani 13
Programming with Interfaces<br />
Separation of interface and implementation<br />
Client software does not need to know the details of<br />
the implementation, cf. abstraction<br />
Important for platform and language independence<br />
Also important to support the evolution of software<br />
someInterface<br />
myObject:myClass<br />
Adv. Dist. Systems G. Blair/ F. Taiani 14
Interface Definition Languages<br />
What is an IDL?<br />
A language independent means of specifying<br />
an interface<br />
Similar to abstract data type specification<br />
Dealing with Parameters<br />
IN parameters = value passed to server<br />
OUT parameters = value returned from server<br />
IN/OUT parameters = combination of above<br />
Adv. Dist. Systems G. Blair/ F. Taiani 15
Implementing RPC<br />
<strong>Communication</strong>s modules<br />
Object A<br />
Proxy<br />
Request<br />
Dispatcher<br />
Remote<br />
Object B<br />
Client Stub<br />
Reply<br />
Server Stub<br />
N.B. Proxies, stubs, dispatchers are generated<br />
automatically by an appropriate IDL compiler<br />
Adv. Dist. Systems G. Blair/ F. Taiani 16
Remote Invocation: Remote<br />
Method Invocation
From RPC to Distributed Objects<br />
(Remote Method Invocation)<br />
obj<br />
obj<br />
obj<br />
OO programming:<br />
process<br />
Objects "speak" to one another by invoking each other's methods<br />
But they can only speak to objects located in the same process!<br />
Adv. Dist. Systems G. Blair/ F. Taiani 18
From RPC to Distributed Objects<br />
(Remote Method Invocation)<br />
obj<br />
obj<br />
obj<br />
obj<br />
process<br />
obj<br />
process<br />
machine<br />
machine<br />
Distributed OO programming: breaking the process boundary<br />
Objects can "speak" to objects in remote processes<br />
Is this magic?<br />
Adv. Dist. Systems G. Blair/ F. Taiani 19<br />
No: It is all layering and abstraction
Essential Characteristics of RMI<br />
Full integration with object-oriented programming language<br />
Ability to exploit objects, class and inheritance<br />
Added benefits from exploiting built in (object-oriented)<br />
approaches to, for example, exception handling<br />
From procedure calling to method invocation<br />
Added expressiveness of supporting object references<br />
More sophisticated options for parameter passing<br />
E.G. See lecture on Java RMI<br />
Pass by (object) reference<br />
Pass by value (exploiting serialisation)<br />
Often integrated with code (object) mobility<br />
E.G. Use of class loading in RMI<br />
Adv. Dist. Systems G. Blair/ F. Taiani 20
Indirect <strong>Communication</strong>:<br />
General
What is Indirect <strong>Communication</strong>?<br />
<strong>Communication</strong> between entities in a distributed<br />
system through an intermediary with no direct<br />
coupling between the sender and the<br />
receiver(s)<br />
But what is the intermediary<br />
Groups<br />
Event-based abstractions (e.g. publish-subscribe)<br />
Message queues<br />
Shared memory abstractions (e.g. DSM, tuple spaces)<br />
Note the optional plural in this definition:<br />
Often intrinsically provides multiparty<br />
communication<br />
Adv. Dist. Systems G. Blair/ F. Taiani 22
A Closer Look at Coupling<br />
Remote invocation paradigms all imply a direct coupling<br />
between the client and server<br />
Indirect communication paradigms seek a level of uncoupling:<br />
Space uncoupling in which the sender does not know or need to<br />
know the identity of the receiver(s) and vice versa<br />
Because of this, the system developer has many degrees of freedom in<br />
dealing with change: participants (senders or receivers) can be replaced,<br />
updated, replicated or migrated<br />
Time uncoupling in which the sender and receiver(s) can have<br />
independent lifetimes (in other words, the sender and receiver(s)<br />
do not need to exist at the same time to communicate)<br />
Important benefits particularly in volatile environments where senders and<br />
receivers may come and go.<br />
Many uses in distributed systems including supporting mobility,<br />
dependability and event dissemination<br />
Adv. Dist. Systems G. Blair/ F. Taiani 23
A Few Words on Indirection<br />
“All problems in computer science<br />
can be solved by another level of<br />
indirection”<br />
Roger Needham et al<br />
“There is no performance problem<br />
that cannot be solved by<br />
eliminating a level of indirection”<br />
Jim Gray<br />
Adv. Dist. Systems G. Blair/ F. Taiani 24
Indirect <strong>Communication</strong>:<br />
Group <strong>Communication</strong>
Group <strong>Communication</strong>: An Initial<br />
Example of Indirection<br />
What is group communication?<br />
Based on the concept of a group abstraction, with operations<br />
provided to join and leave the group (cf. group membership)<br />
Messages sent to a group rather than to any individual process<br />
and messages are then delivered to each member of the group<br />
Often enhanced by guarantees in terms of message ordering<br />
and reliability<br />
Uses in distributed systems<br />
Important in supporting fault-tolerance<br />
e.g. supporting replication<br />
Also used heavily in dissemination of events<br />
e.g. in financial systems<br />
Key examples include JGroups and Isis<br />
See lecture on<br />
coordination and<br />
agreement<br />
Adv. Dist. Systems G. Blair/ F. Taiani 26
A Typical Group Service<br />
interface Group<strong>Communication</strong>Service {<br />
appli<br />
appli<br />
}<br />
// creates a new group and returns the groups ID<br />
public GroupID<br />
groupCreate();<br />
// Adding & Removing a member to/from a group<br />
public void groupJoin ( GroupID group, Participant member);<br />
public void groupLeave ( GroupID group, Participant member);<br />
// multicasts a message to the named group with<br />
// the specified delivery semantics, and<br />
// optionally collects a number of replies<br />
group<br />
comm<br />
public Messages[] multicast ( GroupID group, OrderType order,<br />
Messages message, int nbReplies)<br />
group<br />
comm<br />
Adv. Dist. Systems G. Blair/ F. Taiani 27
Example<br />
When replicating servers: collective coordination needed<br />
either for fault-tolerance or scalability<br />
server 1<br />
server 2<br />
warehouse<br />
I’ve sold<br />
the last hat<br />
buy<br />
OK<br />
coming<br />
soon<br />
I’ve sold the last hat<br />
user<br />
Adv. Dist. Systems G. Blair/ F. Taiani 28
Problems<br />
Reliability (against loss / crash)<br />
Ordering<br />
Scalability<br />
Adv. Dist. Systems G. Blair/ F. Taiani 29
Reliability Guarantees<br />
Unreliable multicast<br />
Message sent to all members and may or may not arrive<br />
Reliable multicast: Protection against faulty network<br />
Reasonable efforts are made to ensure delivery in spite of<br />
message losses<br />
Can be based on positive or negative acknowledgements<br />
No guarantees is the sender crashes during multicast<br />
Atomic multicast: Protection against faulty participants<br />
All members receive message, or none do<br />
Main issue: tolerate the sender’s crash<br />
Adv. Dist. Systems G. Blair/ F. Taiani 30
Implementing Reliable Multicast<br />
<br />
<br />
<br />
<br />
<br />
1) Originator sends a message to each member of the group,<br />
and awaits acknowledgements (ACK)<br />
2) If some acknowledgements are not received in a given<br />
period of time, re-send message; repeat this n times if<br />
necessary<br />
3) If all acknowledgements received<br />
then report success to caller<br />
Works fine if:<br />
Network problems are transient (msgs eventually get through)<br />
No crash (and no spurious behaviour)<br />
Not very scalable: ACK explosion!<br />
ack<br />
A<br />
B msg msg<br />
ack<br />
C<br />
msg<br />
ack<br />
D<br />
Adv. Dist. Systems G. Blair/ F. Taiani 31
Avoiding ACKs Explosion<br />
Using negative ACKs (abbreviated in NACKs)<br />
If everything is fine the receiver does say anything<br />
If a message is lost the receiver complains to the sender<br />
Problem: How do we know that a message should be there?<br />
B<br />
counter = 01<br />
m 1<br />
m 1<br />
last msg from B = 0<br />
C<br />
A<br />
last msg from B = 10<br />
?<br />
B<br />
counter = 12<br />
NACK(m 1 )<br />
m 2<br />
m 21<br />
C<br />
last msg from B = 0<br />
A<br />
m 2<br />
last msg from B = 21<br />
I’ve missed<br />
1 message!<br />
A<br />
B<br />
C<br />
Adv. Dist. Systems G. Blair/ F. Taiani 32
Ordering Guarantees<br />
Unordered multicast<br />
No guarantees<br />
FIFO (First In, First Out)<br />
Messages sent from the same process are delivered in the order they<br />
were sent at different sites<br />
Messages sent from different processes may be delivered in different<br />
orders at different sites<br />
Totally ordered multicast<br />
Consider messages m1 and m2 sent to the group by (potentially)<br />
different processes<br />
Either m1 will be delivered before m2 or vice versa for all members of<br />
the group<br />
Causally ordered multicast<br />
As above, except the ordering of m1 and m2 is only important if a<br />
“happened-before” relationship exists between the messages<br />
Adv. Dist. Systems G. Blair/ F. Taiani 33
Total Ordering vs Causal<br />
Ordering<br />
Adv. Dist. Systems G. Blair/ F. Taiani 34
Implementing Total Ordering<br />
<br />
<br />
The sequencer approach (centralised)<br />
1. All requests sent to a sequencer, where they are given an ID<br />
2. The sequencer assigns consecutive increasing IDs<br />
3. Requested arriving at sites are held back until they are next in<br />
sequence<br />
Problems: sequencer = bottleneck + single point of failure<br />
Other approaches<br />
Distributed agreement to generate ids (as in Isis)<br />
Assign timestamps from a (global) logical or physical clock<br />
These are complex<br />
Adv. Dist. Systems G. Blair/ F. Taiani 35
Indirect <strong>Communication</strong>:<br />
Publish-Subscribe
Publish-Subscribe Systems<br />
What is publish-subscribe?<br />
A key example of a distributed event-based system whereby:<br />
Publishers publish an event e: publish(e)<br />
Subscribers express interest in a set of events specified by a filter<br />
f: subscribe(f)<br />
Events are delivered asynchronously: notify(e)<br />
Publish optionally advertise what they will produce: advertise(f)<br />
The system acts as a broker to deliver events to the right<br />
subscribers<br />
Uses in distributed systems<br />
As with groups, used in financial information systems and<br />
related news feeds applications<br />
Feature heavily in systems supporting cooperative working<br />
Increasingly used in ubiquitous computing/ monitoring<br />
Examples include JMS, Scribe, Siena, Gryphon and Hermes<br />
Adv. Dist. Systems G. Blair/ F. Taiani 37
Publish-Subscribe Systems<br />
(continued)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 38
A Closer Look at the Subscription<br />
Channel-based<br />
Model<br />
Publishers publish events to named channels and subscribers<br />
then subscribe to one of these named channels and therefore<br />
receive all events sent to that channel<br />
Rather primitive scheme and the only one that defines a<br />
physical channel (all other schemes employ some form of<br />
filtering over the content of an event as we will see below)<br />
Topic-based (also referred to as subject-based)<br />
Each notification is expressed in terms of a number of fields<br />
with one field denoting the topic and subscriptions defined in<br />
terms of this topic of interest<br />
Similar to channel-based approaches (implicit vs. explicit)<br />
Can be enhanced by introducing hierarchies of topics<br />
Adv. Dist. Systems G. Blair/ F. Taiani 39
Subscription Models (continued)<br />
Content-based<br />
Generalization of topic-based allowing the expression of<br />
subscriptions over a range of fields in an event notification.<br />
The filter is a query defined in terms of compositions of<br />
constraints over the values of event attributes<br />
Significantly more expressive but with significant new<br />
challenges introduced in terms of implementation<br />
Type-based<br />
Intrinsically linked with (typed) object-based approaches<br />
Subscriptions defined in terms of this type (signature/ methods/<br />
attributes), with matching defined in terms of types or subtypes<br />
of the given filter<br />
Can be integrated elegantly into programming languages and<br />
can also check type correctness of subscriptions.<br />
Adv. Dist. Systems G. Blair/ F. Taiani 40
Implementing Publish-Subscribe<br />
The key decision is whether to go for a centralised,<br />
distributed or fully peer-to-peer architecture<br />
Most systems have distributed architectures consisting of a<br />
network of brokers<br />
Adv. Dist. Systems G. Blair/ F. Taiani 41
Flooding<br />
Realising Content-based<br />
Approaches<br />
Send all published events to all possible recipients, or<br />
alternatively all subscriptions back to all possible senders<br />
Can be supported by underlying multicast service as available<br />
Simple but significant message overload<br />
Filtering<br />
Filters propagated back through the broker network with the<br />
intuition that notifications are only forwarded through the broker<br />
network if there is a path to a valid subscriber<br />
Advertisements<br />
In above scheme, filters need to be sent to all possible publishers<br />
Overhead can be reduced by use of advertisements<br />
Adv. Dist. Systems G. Blair/ F. Taiani 42
Realising Content-based<br />
Approaches (continued)<br />
Rendezvous<br />
Consider the set of all possible events as an event space<br />
Partition this event space into pieces and allocate<br />
responsibility for each piece to a given broker (known as<br />
the rendezvous node for that event)<br />
Implementation requires two functions to be defined:<br />
SN(s) which takes a given subscription, s, and returns one or more<br />
rendezvous nodes which take responsibility for that subscription<br />
EN(e) which takes a given event, e, and returns one or more rendezvous<br />
nodes responsible for matching e against subscriptions in the system<br />
Can lead to highly scalable implementations<br />
Adv. Dist. Systems G. Blair/ F. Taiani 43
An Optional Exercise<br />
The following is an actual algorithm to implement filtering<br />
This algorithm assume each node maintains<br />
A neighbours list containing a list of all connected neighbours in<br />
the network of brokers<br />
A subscription list containing a list of all directly connected<br />
subscribers serviced by this node<br />
A routing table – maintaining a list of neighbours and valid<br />
subscriptions for that pathway<br />
An implementation of matching on each node in the network of<br />
brokers, in particular a match function takes a given event<br />
notification and a list of nodes together with associated<br />
subscriptions and returns a set of nodes where the notification<br />
matches the subscription<br />
Adv. Dist. Systems G. Blair/ F. Taiani 44
The Algorithm<br />
upon receive publish(event e) from node x<br />
matchlist := match(e, subscriptions)<br />
send notify(e) to matchlist;<br />
fwdlist := match(e, routing);<br />
send publish(e) to fwdlist - x;<br />
upon receive subscribe(subscription s) from node x<br />
if x is client then<br />
add x to subscriptions;<br />
else add(x, s) to routing;<br />
send subscribe(s) to neighbours - x;<br />
Adv. Dist. Systems G. Blair/ F. Taiani 45
Some Additional Notes<br />
<br />
<br />
<br />
<br />
<br />
<br />
When a broker receives a publish request from a given node, it must pass<br />
this subscription to all connected nodes where there is a corresponding<br />
matching subscription and also decide where to propagate this event<br />
through the network of brokers.<br />
Lines 2 and 3 achieve the first goal by matching the event against the<br />
subscription list and then forwarding the event to all the nodes with<br />
matching subscriptions.<br />
Lines 4 and 5 then use the match function again, this time matching the<br />
event against the routing table and forwarding only to the paths that lead<br />
to a subscription.<br />
Brokers must also deal with incoming subscription events.<br />
If the subscription event is from an immediately connected subscriber then<br />
this subscription must be entered in the subscriptions table (lines 7 and 8).<br />
Otherwise, the broker is an intermediary node and now knows that a<br />
pathway exists towards this subscription and hence an appropriate entry is<br />
added to the routing table (line 9). In both cases, this subscription event is<br />
then passed to all neighbours apart from the originating node (line 10).<br />
Adv. Dist. Systems G. Blair/ F. Taiani 46
Indirect <strong>Communication</strong>:<br />
Message Queues
What is a message queue?<br />
Message Queues<br />
An alternative paradigm for indirect communication based on<br />
distributed queues<br />
Messages are sent to a queue<br />
Processes can then access messages in the queue either by<br />
receiving a message (blocking), polling for messages (non-blocking)<br />
or being notified when messages arrive<br />
Messages are persistent and message delivery is reliable<br />
Fundamentally a point-to-point service (not multi-party)<br />
Uses in distributed systems<br />
Enterprise Application Integration (EAI), i.e. integration between<br />
applications in a given enterprise utilising the loose coupling<br />
Also used heavily in commercial transaction processing systems<br />
Examples include JMS, IBM’s Websphere MQ, Microsoft’s MSMQ<br />
and Oracle’s Streams Advanced Queuing AQ)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 48
Message Queues (continued)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 49
Indirect <strong>Communication</strong>:<br />
Distributed Shared Memory
Distributed Shared Memory<br />
What is a distributed shared memory?<br />
Provides an abstraction of shared memory in a distributed system<br />
If data is not available locally, it must be fetched (similar to a page fault in<br />
traditional virtual memory systems)<br />
Hides distribution entirely from the programmer<br />
No new programming abstractions to learn<br />
Can be costly to implement though in terms of maintaining consistency of<br />
shared data<br />
Uses in distributed systems<br />
Tends to be more specialist, for example for parallel and<br />
distributed computation in cluster computers (i.e. relatively tightly<br />
coupled distributed architectures)<br />
Examples include JMS, IBM’s Websphere MQ, Microsoft’s MSMQ<br />
and Oracle’s Streams Advanced Queuing AQ)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 51
Distributed Shared Memory<br />
(continued)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 52
Indirect <strong>Communication</strong>:<br />
Tuple Spaces
Tuple Spaces<br />
What is a tuple space?<br />
Another paradigm for indirect communication offering an<br />
abstraction of a semi-structured shared space consisting of a<br />
number of tuples<br />
Processes can write tuples to the tuple space<br />
Processes can then read a tuple from tuple space which also leaves<br />
a copy in the tuple space<br />
Alternatively, they can take the tuple which removes it from the space<br />
Both the read and take operations are based on pattern matching<br />
(associative access)<br />
Uses in distributed systems<br />
Influential in the areas of mobile and ubiquitous computing<br />
Also used for systems integration<br />
Examples include Linda, JavaSpaces, IBM’s TSpaces and<br />
L2imbo<br />
Adv. Dist. Systems G. Blair/ F. Taiani 54
Tuple Spaces(continued)<br />
Adv. Dist. Systems G. Blair/ F. Taiani 55
Indirect <strong>Communication</strong>: A<br />
Comparison
Comparison of Approaches<br />
Adv. Dist. Systems G. Blair/ F. Taiani 57
Expected Learning Outcomes<br />
At the end of this session:<br />
You should understand the range of low level services offered by<br />
, what they might be useful for and their<br />
intrinsic limitations<br />
You should understand the services offered by<br />
paradigms and the essential difference between<br />
and<br />
You should understand the essence of<br />
, why it is<br />
different from remote invocation (for example) and the role of time and<br />
space uncoupling in supporting these characteristics<br />
You should appreciate the of indirect communication services<br />
available and be able to assess their strengths and weaknesses<br />
You should be able to select appropriate communication paradigms for<br />
a given distributed systems application mapping requirements on to<br />
underlying communication services and being aware of the trade-offs<br />
associate with such decisions<br />
Adv. Dist. Systems G. Blair/ F. Taiani 58