Towards a Platform for Widespread Embedded Intelligence - ERCIM
Towards a Platform for Widespread Embedded Intelligence - ERCIM
Towards a Platform for Widespread Embedded Intelligence - ERCIM
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
SPECIAL THEME: <strong>Embedded</strong> <strong>Intelligence</strong><br />
From these three inputs, the heuristic<br />
distributes the operations of ALG onto<br />
the processors of ARC and schedules<br />
them statically, together with the communications<br />
induced by these scheduling<br />
decisions. The output of the heuristic is<br />
there<strong>for</strong>e a static schedule from which<br />
embeddable code can be generated.<br />
Our fault hypothesis is that the hardware<br />
components are fail silent, meaning that<br />
a component is either healthy and works<br />
fine, or is faulty and produces no output<br />
at all. Recent studies on modern hardware<br />
architectures have shown that a<br />
fail-silent behaviour can be achieved at a<br />
reasonable cost, so our fault hypothesis<br />
is reasonable.<br />
Our contribution consists of the definition<br />
of several new scheduling/distribution<br />
heuristics in order to generate static<br />
schedules that are also tolerant of a fixed<br />
number of hardware components (processors<br />
and/or communication links)<br />
faults. They are implemented inside<br />
SynDEx, as an alternative to its own<br />
default heuristics (called DSH:<br />
Distribution Scheduling Heuristic):<br />
• FTBAR (Fault-Tolerant Based Active<br />
Replication) generates a static<br />
schedule that tolerates Npf processor<br />
faults by replicating actively all the<br />
operations of the algorithm graph<br />
ALG exactly Npf+1 times. It works<br />
with target architectures having either<br />
point-to-point communication links or<br />
buses, but assumes that all the communication<br />
links are reliable. FTBAR<br />
tries to minimise the critical path of the<br />
obtained schedule w.r.t. the known<br />
WCETs of the operations onto the various<br />
processors of the architecture.<br />
• RBSA (Reliable Bicriteria Scheduling<br />
Algorithm) also generates a reliable<br />
and static schedule by actively replicating<br />
the operations of the algorithm<br />
graph. The difference with FTBAR is<br />
that the number of times an operation<br />
is replicated depends on the individual<br />
reliability of the processors it is scheduled<br />
on and on the overall reliability<br />
level required by the user. RBSA tries<br />
both to minimise the critical path of<br />
the obtained schedule and to maximise<br />
its reliability (these are the two criteria<br />
of this heuristic).<br />
26 <strong>ERCIM</strong> News No. 67, October 2006<br />
To the left is an example of an algorithm graph: it has nine operations (represented by<br />
circles) and 11 data-dependences (represented by green arrows). Among the<br />
operations, one is a sensor operation (I), one is an actuator operation (O), while the<br />
seven others are computations (A to G). Below to the right is an example of an<br />
architecture graph: it has three processors (P1, P2, and P3) and three point-to-point<br />
communication links (L1.2, L1.3, and L2.3).<br />
• GRT + eDSH (Graph Redundancy<br />
Trans<strong>for</strong>mation + extended Distribution<br />
Scheduling Heuristic) generates a<br />
static schedule that tolerates Npf processor<br />
faults and Nlf communication<br />
link faults. It first trans<strong>for</strong>ms the algorithm<br />
graph ALG into another dataflow<br />
graph ALG* by adding redundancy<br />
into it such that the required<br />
number of hardware component faults<br />
will be tolerated. During this phase, it<br />
also generates exclusion relations<br />
between subsets of operations that<br />
must be scheduled onto distinct processors,<br />
and subsets of data dependences<br />
that must be routed through disjoint<br />
paths. Then it uses an extended version<br />
of the DSH heuristics to generate a<br />
static schedule of ALG* onto ARC,<br />
w.r.t. the exclusion relations generated<br />
during the first phase.<br />
• FPMH (Fault Patterns Merging<br />
Heuristic) is an original approach to<br />
generate a static schedule of ALG onto<br />
ARC, tolerant to a given list of fault<br />
patterns. A fault pattern is a subset of<br />
the architecture's component that can<br />
fail simultaneously. Our method<br />
involves two steps. First, <strong>for</strong> each fault<br />
pattern, we generate the corresponding<br />
reduced architecture (the architecture<br />
from which the pattern's component<br />
has been removed) and we generate a<br />
static schedule of ALG onto this<br />
reduced architecture (we use the basic<br />
DSH heuristic of SYnDEx <strong>for</strong> this).<br />
From N fault patterns we there<strong>for</strong>e<br />
obtain N basic schedules. The second<br />
step consists of the merging of these N<br />
basic schedules into one static<br />
schedule that will be, by construction,<br />
tolerant to all the specified fault patterns.<br />
Links:<br />
Fault-tolerance:<br />
http://pop-art.inrialpes.fr/~girault/Projets/FT/<br />
SynDEx: http://www.syndex.org<br />
Please contact:<br />
Alain Girault, INRIA Rhône-Alpes<br />
Tel: +33 476 61 53 51<br />
E-mail: Alain.Girault@inrialpes.fr