04.12.2012 Views

Towards a Platform for Widespread Embedded Intelligence - ERCIM

Towards a Platform for Widespread Embedded Intelligence - ERCIM

Towards a Platform for Widespread Embedded Intelligence - ERCIM

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SPECIAL THEME: <strong>Embedded</strong> <strong>Intelligence</strong><br />

From these three inputs, the heuristic<br />

distributes the operations of ALG onto<br />

the processors of ARC and schedules<br />

them statically, together with the communications<br />

induced by these scheduling<br />

decisions. The output of the heuristic is<br />

there<strong>for</strong>e a static schedule from which<br />

embeddable code can be generated.<br />

Our fault hypothesis is that the hardware<br />

components are fail silent, meaning that<br />

a component is either healthy and works<br />

fine, or is faulty and produces no output<br />

at all. Recent studies on modern hardware<br />

architectures have shown that a<br />

fail-silent behaviour can be achieved at a<br />

reasonable cost, so our fault hypothesis<br />

is reasonable.<br />

Our contribution consists of the definition<br />

of several new scheduling/distribution<br />

heuristics in order to generate static<br />

schedules that are also tolerant of a fixed<br />

number of hardware components (processors<br />

and/or communication links)<br />

faults. They are implemented inside<br />

SynDEx, as an alternative to its own<br />

default heuristics (called DSH:<br />

Distribution Scheduling Heuristic):<br />

• FTBAR (Fault-Tolerant Based Active<br />

Replication) generates a static<br />

schedule that tolerates Npf processor<br />

faults by replicating actively all the<br />

operations of the algorithm graph<br />

ALG exactly Npf+1 times. It works<br />

with target architectures having either<br />

point-to-point communication links or<br />

buses, but assumes that all the communication<br />

links are reliable. FTBAR<br />

tries to minimise the critical path of the<br />

obtained schedule w.r.t. the known<br />

WCETs of the operations onto the various<br />

processors of the architecture.<br />

• RBSA (Reliable Bicriteria Scheduling<br />

Algorithm) also generates a reliable<br />

and static schedule by actively replicating<br />

the operations of the algorithm<br />

graph. The difference with FTBAR is<br />

that the number of times an operation<br />

is replicated depends on the individual<br />

reliability of the processors it is scheduled<br />

on and on the overall reliability<br />

level required by the user. RBSA tries<br />

both to minimise the critical path of<br />

the obtained schedule and to maximise<br />

its reliability (these are the two criteria<br />

of this heuristic).<br />

26 <strong>ERCIM</strong> News No. 67, October 2006<br />

To the left is an example of an algorithm graph: it has nine operations (represented by<br />

circles) and 11 data-dependences (represented by green arrows). Among the<br />

operations, one is a sensor operation (I), one is an actuator operation (O), while the<br />

seven others are computations (A to G). Below to the right is an example of an<br />

architecture graph: it has three processors (P1, P2, and P3) and three point-to-point<br />

communication links (L1.2, L1.3, and L2.3).<br />

• GRT + eDSH (Graph Redundancy<br />

Trans<strong>for</strong>mation + extended Distribution<br />

Scheduling Heuristic) generates a<br />

static schedule that tolerates Npf processor<br />

faults and Nlf communication<br />

link faults. It first trans<strong>for</strong>ms the algorithm<br />

graph ALG into another dataflow<br />

graph ALG* by adding redundancy<br />

into it such that the required<br />

number of hardware component faults<br />

will be tolerated. During this phase, it<br />

also generates exclusion relations<br />

between subsets of operations that<br />

must be scheduled onto distinct processors,<br />

and subsets of data dependences<br />

that must be routed through disjoint<br />

paths. Then it uses an extended version<br />

of the DSH heuristics to generate a<br />

static schedule of ALG* onto ARC,<br />

w.r.t. the exclusion relations generated<br />

during the first phase.<br />

• FPMH (Fault Patterns Merging<br />

Heuristic) is an original approach to<br />

generate a static schedule of ALG onto<br />

ARC, tolerant to a given list of fault<br />

patterns. A fault pattern is a subset of<br />

the architecture's component that can<br />

fail simultaneously. Our method<br />

involves two steps. First, <strong>for</strong> each fault<br />

pattern, we generate the corresponding<br />

reduced architecture (the architecture<br />

from which the pattern's component<br />

has been removed) and we generate a<br />

static schedule of ALG onto this<br />

reduced architecture (we use the basic<br />

DSH heuristic of SYnDEx <strong>for</strong> this).<br />

From N fault patterns we there<strong>for</strong>e<br />

obtain N basic schedules. The second<br />

step consists of the merging of these N<br />

basic schedules into one static<br />

schedule that will be, by construction,<br />

tolerant to all the specified fault patterns.<br />

Links:<br />

Fault-tolerance:<br />

http://pop-art.inrialpes.fr/~girault/Projets/FT/<br />

SynDEx: http://www.syndex.org<br />

Please contact:<br />

Alain Girault, INRIA Rhône-Alpes<br />

Tel: +33 476 61 53 51<br />

E-mail: Alain.Girault@inrialpes.fr

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!