17.01.2015 Views

Erlang and OTP in Action.pdf - Synrc

Erlang and OTP in Action.pdf - Synrc

Erlang and OTP in Action.pdf - Synrc

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12<br />

affect<strong>in</strong>g the rest of your code <strong>and</strong> start over, logg<strong>in</strong>g precisely where th<strong>in</strong>gs went pearshaped<br />

<strong>and</strong> how. This can also take some gett<strong>in</strong>g used to, but is a powerful recipe for fault<br />

tolerance <strong>and</strong> for creat<strong>in</strong>g systems that are possible to debug despite their complexity.<br />

1.2.2 – Supervision <strong>and</strong> the trapp<strong>in</strong>g of exit signals<br />

One of the ma<strong>in</strong> ways fault tolerance is achieved <strong>in</strong> <strong>OTP</strong> is by overrid<strong>in</strong>g the default<br />

propagation of exit signals. By sett<strong>in</strong>g a process flag called trap_exit, we can make a<br />

process trap any <strong>in</strong>com<strong>in</strong>g exit signal rather than obey it. In this case, when the signal is<br />

received, it is simply dropped <strong>in</strong> the process’ mailbox as a normal message on the form<br />

{'EXIT', Pid, Reason} that describes <strong>in</strong> which other process the failure orig<strong>in</strong>ated <strong>and</strong><br />

why, allow<strong>in</strong>g the trapp<strong>in</strong>g process to check for such messages <strong>and</strong> take action.<br />

Such a signal trapp<strong>in</strong>g process is sometimes called a system process, <strong>and</strong> will typically<br />

be runn<strong>in</strong>g code that is very different from that run by ord<strong>in</strong>ary worker processes, which do<br />

not usually trap exit signals. S<strong>in</strong>ce a system process acts as a bulwark that prevents exit<br />

signals from propagat<strong>in</strong>g further, it <strong>in</strong>sulates the processes it is l<strong>in</strong>ked to from each other,<br />

<strong>and</strong> can also be entrusted with report<strong>in</strong>g failures <strong>and</strong> even restart<strong>in</strong>g the failed subsystems.<br />

We call such processes supervisors.<br />

Figure illustrat<strong>in</strong>g supervisor, workers, <strong>and</strong> signals<br />

The po<strong>in</strong>t of lett<strong>in</strong>g an entire subsystem term<strong>in</strong>ate completely <strong>and</strong> be restarted is that it<br />

br<strong>in</strong>gs us back to a state known to function properly. Th<strong>in</strong>k of it like reboot<strong>in</strong>g your<br />

computer: a way to clear up a mess <strong>and</strong> restart from a po<strong>in</strong>t that ought to be work<strong>in</strong>g. But<br />

the problem with a computer reboot it is that it is not granular enough. Ideally, what you<br />

would like to be able to do is reboot just a part of the system, <strong>and</strong> the smaller, the better.<br />

<strong>Erlang</strong> process l<strong>in</strong>ks <strong>and</strong> supervisors provide a mechanism for such f<strong>in</strong>e-gra<strong>in</strong>ed “reboots”.<br />

If that was all, though, we would still be left to implement our supervisors from scratch,<br />

which would require careful thought, lots of experience, <strong>and</strong> a long time shak<strong>in</strong>g out the<br />

bugs <strong>and</strong> corner cases. Fortunately for us, the <strong>OTP</strong> framework already provides just about<br />

everyth<strong>in</strong>g we need: both a methodology for structur<strong>in</strong>g our applications us<strong>in</strong>g supervision,<br />

<strong>and</strong> stable, battle-hardened libraries to build them on.<br />

<strong>OTP</strong> allows processes to be started by a supervisor <strong>in</strong> a prescribed manner <strong>and</strong> order. A<br />

supervisor can also be told how to restart its processes with respect to one another <strong>in</strong> the<br />

event of a failure of any s<strong>in</strong>gle process, how many attempts it should make to restart the<br />

©Mann<strong>in</strong>g Publications Co. Please post comments or corrections to the Author Onl<strong>in</strong>e forum:<br />

http://www.mann<strong>in</strong>g-s<strong>and</strong>box.com/forum.jspaforumID=454

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!