Hardware/Software Introduction Chapter 1 Introduction
Embedded Systems Design: A Unified Hardware/Software ... - EET
Embedded Systems Design: A Unified Hardware/Software ... - EET
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong><br />
<strong>Chapter</strong> 1: <strong>Introduction</strong><br />
1
Outline<br />
• Embedded systems overview<br />
– What are they?<br />
• Design challenge – optimizing design metrics<br />
• Technologies<br />
– Processor technologies<br />
– IC technologies<br />
– Design technologies<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
2
Embedded systems overview<br />
• Embedded computing systems<br />
– Computing systems embedded within<br />
electronic devices<br />
– Hard to define. Nearly any computing<br />
system other than a desktop computer<br />
– Billions of units produced yearly, versus<br />
millions of desktop units<br />
– Perhaps 50 per household and per<br />
automobile<br />
Computers are in here...<br />
and here...<br />
and even here...<br />
Lots more of these,<br />
though they cost a lot<br />
less each.<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
3
A “short list” of embedded systems<br />
Anti-lock brakes<br />
Auto-focus cameras<br />
Automatic teller machines<br />
Automatic toll systems<br />
Automatic transmission<br />
Avionic systems<br />
Battery chargers<br />
Camcorders<br />
Cell phones<br />
Cell-phone base stations<br />
Cordless phones<br />
Cruise control<br />
Curbside check-in systems<br />
Digital cameras<br />
Disk drives<br />
Electronic card readers<br />
Electronic instruments<br />
Electronic toys/games<br />
Factory control<br />
Fax machines<br />
Fingerprint identifiers<br />
Home security systems<br />
Life-support systems<br />
Medical testing systems<br />
Modems<br />
MPEG decoders<br />
Network cards<br />
Network switches/routers<br />
On-board navigation<br />
Pagers<br />
Photocopiers<br />
Point-of-sale systems<br />
Portable video games<br />
Printers<br />
Satellite phones<br />
Scanners<br />
Smart ovens/dishwashers<br />
Speech recognizers<br />
Stereo systems<br />
Teleconferencing systems<br />
Televisions<br />
Temperature controllers<br />
Theft tracking systems<br />
TV set-top boxes<br />
VCR’s, DVD players<br />
Video game consoles<br />
Video phones<br />
Washers and dryers<br />
And the list goes on and on<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
4
Some common characteristics of embedded<br />
systems<br />
• Single-functioned<br />
– Executes a single program, repeatedly<br />
• Tightly-constrained<br />
– Low cost, low power, small, fast, etc.<br />
• Reactive and real-time<br />
– Continually reacts to changes in the system’s environment<br />
– Must compute certain results in real-time without delay<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
5
Embedded system example -- digital camera<br />
CCD<br />
Digital camera chip<br />
A2D<br />
CCD preprocessor<br />
Pixel coprocessor<br />
D2A<br />
lens<br />
JPEG codec<br />
Microcontroller<br />
Multiplier/Accum<br />
DMA controller<br />
Display ctrl<br />
Memory controller ISA bus interface UART LCD ctrl<br />
• Single-functioned -- always a digital camera<br />
• Tightly-constrained -- Low cost, low power, small, fast<br />
• Reactive and real-time -- only to a small extent<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
6
Design challenge – optimizing design metrics<br />
• Obvious design goal:<br />
– Construct an implementation with desired functionality<br />
• Key design challenge:<br />
– Simultaneously optimize numerous design metrics<br />
• Design metric<br />
– A measurable feature of a system’s implementation<br />
– Optimizing design metrics is a key challenge<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
7
Design challenge – optimizing design metrics<br />
• Common metrics<br />
– Unit cost: the monetary cost of manufacturing each copy of the system,<br />
excluding NRE cost<br />
– NRE cost (Non-Recurring Engineering cost): The one-time<br />
monetary cost of designing the system<br />
–Size: the physical space required by the system<br />
– Performance: the execution time or throughput of the system<br />
– Power: the amount of power consumed by the system<br />
– Flexibility: the ability to change the functionality of the system without<br />
incurring heavy NRE cost<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
8
Design challenge – optimizing design metrics<br />
• Common metrics (continued)<br />
– Time-to-prototype: the time needed to build a working version of the<br />
system<br />
– Time-to-market: the time required to develop a system to the point that it<br />
can be released and sold to customers<br />
– Maintainability: the ability to modify the system after its initial release<br />
– Correctness, safety, many more<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
9
Design metric competition -- improving one<br />
may worsen others<br />
lens<br />
CCD<br />
Performance<br />
Digital camera chip<br />
A2D<br />
JPEG codec<br />
DMA controller<br />
CCD preprocessor<br />
Power<br />
NRE cost<br />
Microcontroller<br />
Pixel coprocessor<br />
Size<br />
D2A<br />
Multiplier/Accum<br />
Display ctrl<br />
• Expertise with both software<br />
and hardware is needed to<br />
optimize design metrics<br />
– Not just a hardware or<br />
software expert, as is common<br />
– A designer must be<br />
comfortable with various<br />
technologies in order to choose<br />
the best for a given application<br />
and constraints<br />
<strong>Hardware</strong><br />
Memory controller ISA bus interface UART LCD ctrl<br />
<strong>Software</strong><br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
10
Time-to-market: a demanding design metric<br />
Revenues ($)<br />
Time (months)<br />
• Time required to develop a<br />
product to the point it can be<br />
sold to customers<br />
• Market window<br />
– Period during which the<br />
product would have highest<br />
sales<br />
• Average time-to-market<br />
constraint is about 8 months<br />
• Delays can be costly<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
11
Losses due to delayed market entry<br />
Peak revenue<br />
• Simplified revenue model<br />
– Product life = 2W, peak at W<br />
Revenues ($)<br />
Market rise<br />
On-time<br />
Peak revenue from<br />
delayed entry<br />
Market fall<br />
– Time of market entry defines a<br />
triangle, representing market<br />
penetration<br />
Delayed<br />
– Triangle area equals revenue<br />
• Loss<br />
D<br />
On-time Delayed<br />
entry entry<br />
W<br />
Time<br />
2W<br />
– The difference between the ontime<br />
and delayed triangle areas<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
12
Losses due to delayed market entry (cont.)<br />
• Area = 1/2 * base * height<br />
Peak revenue<br />
– On-time = 1/2 * 2W * W<br />
Revenues ($)<br />
Market rise<br />
On-time<br />
Peak revenue from<br />
delayed entry<br />
Market fall<br />
– Delayed = 1/2 * (W-D+W)*(W-D)<br />
• Percentage revenue loss =<br />
(D(3W-D)/2W 2 )*100%<br />
Delayed<br />
• Try some examples<br />
D<br />
On-time Delayed<br />
entry entry<br />
W<br />
Time<br />
2W<br />
– Lifetime 2W=52 wks, delay D=4 wks<br />
– (4*(3*26 –4)/2*26^2) = 22%<br />
– Lifetime 2W=52 wks, delay D=10 wks<br />
– (10*(3*26 –10)/2*26^2) = 50%<br />
– Delays are costly!<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
13
NRE and unit cost metrics<br />
• Compare technologies by costs -- best depends on quantity<br />
– Technology A: NRE=$2,000, unit=$100<br />
– Technology B: NRE=$30,000, unit=$30<br />
– Technology C: NRE=$100,000, unit=$2<br />
total cost (x1000)<br />
$200,000<br />
A<br />
B<br />
$160,000<br />
C<br />
$120,000<br />
$80,000<br />
$40,000<br />
per product cost<br />
$200<br />
A<br />
B<br />
$160<br />
C<br />
$120<br />
$80<br />
$40<br />
$0<br />
$0<br />
0 800 1600 2400<br />
0 800 1600 2400<br />
Number of units (volume)<br />
• But, must also consider time-to-market<br />
Number of units (volume)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
14
The performance design metric<br />
• Widely-used measure of system, widely-abused<br />
– Clock frequency, instructions per second – not good measures<br />
– Digital camera example – a user cares about how fast it processes images, not<br />
clock speed or instructions per second<br />
• Latency (response time)<br />
– Time between task start and end<br />
– e.g., Camera’s A and B process images in 0.25 seconds<br />
• Throughput<br />
– Tasks per second, e.g. Camera A processes 4 images per second<br />
– Throughput can be more than latency seems to imply due to concurrency, e.g.<br />
Camera B may process 8 images per second (by capturing a new image while<br />
previous image is being stored).<br />
• Speedup of B over S = B’s performance / A’s performance<br />
– Throughput speedup = 8/4 = 2<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
15
Three key embedded system technologies<br />
• Technology<br />
– A manner of accomplishing a task, especially using technical<br />
processes, methods, or knowledge<br />
• Three key technologies for embedded systems<br />
– Processor technology<br />
– IC technology<br />
– Design technology<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
16
Processor technology<br />
• The architecture of the computation engine used to implement a<br />
system’s desired functionality<br />
• Processor does not have to be programmable<br />
– “Processor” not equal to general-purpose processor<br />
Controller<br />
Datapath<br />
Controller<br />
Datapath<br />
Controller<br />
Datapath<br />
Control<br />
logic and<br />
State register<br />
IR<br />
PC<br />
Register<br />
file<br />
General<br />
ALU<br />
Control logic<br />
and State<br />
register<br />
IR<br />
PC<br />
Registers<br />
Custom<br />
ALU<br />
Control<br />
logic<br />
State<br />
register<br />
index<br />
total<br />
+<br />
Data<br />
memory<br />
Data<br />
memory<br />
Program<br />
memory<br />
Assembly code<br />
for:<br />
Data<br />
memory<br />
Program memory<br />
Assembly code<br />
for:<br />
total = 0<br />
for i =1 to …<br />
General-purpose (“software”)<br />
total = 0<br />
for i =1 to …<br />
Application-specific<br />
Single-purpose (“hardware”)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
17
Processor technology<br />
• Processors vary in their customization for the problem at hand<br />
Desired<br />
functionality<br />
total = 0<br />
for i = 1 to N loop<br />
total += M[i]<br />
end loop<br />
General-purpose<br />
processor<br />
Application-specific<br />
processor<br />
Single-purpose<br />
processor<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
18
General-purpose processors<br />
• Programmable device used in a variety of<br />
applications<br />
– Also known as “microprocessor”<br />
• Features<br />
– Program memory<br />
– General datapath with large register file and<br />
general ALU<br />
• User benefits<br />
– Low time-to-market and NRE costs<br />
– High flexibility<br />
• “Pentium” the most well-known, but<br />
there are hundreds of others<br />
Controller<br />
Control<br />
logic and<br />
State<br />
register<br />
IR<br />
PC<br />
Program<br />
memory<br />
Assembly code<br />
for:<br />
total = 0<br />
for i =1 to …<br />
Datapath<br />
Register<br />
file<br />
General<br />
ALU<br />
Data<br />
memory<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
19
Single-purpose processors<br />
• Digital circuit designed to execute exactly<br />
one program<br />
– a.k.a. coprocessor, accelerator or peripheral<br />
• Features<br />
– Contains only the components needed to<br />
execute a single program<br />
– No program memory<br />
• Benefits<br />
– Fast<br />
– Low power<br />
– Small size<br />
Controller<br />
Control<br />
logic<br />
State<br />
register<br />
Datapath<br />
index<br />
total<br />
+<br />
Data<br />
memory<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
20
Application-specific processors<br />
• Programmable processor optimized for a<br />
particular class of applications having<br />
common characteristics<br />
– Compromise between general-purpose and<br />
single-purpose processors<br />
• Features<br />
– Program memory<br />
– Optimized datapath<br />
– Special functional units<br />
• Benefits<br />
– Some flexibility, good performance, size and<br />
power<br />
Controller<br />
Control<br />
logic and<br />
State<br />
register<br />
IR<br />
PC<br />
Program<br />
memory<br />
Assembly code<br />
for:<br />
total = 0<br />
for i =1 to …<br />
Datapath<br />
Registers<br />
Custom<br />
ALU<br />
Data<br />
memory<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
21
IC technology<br />
• Three types of IC technologies<br />
– Full-custom/VLSI<br />
– Semi-custom ASIC (gate array and standard cell)<br />
– PLD (Programmable Logic Device)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
22
Full-custom/VLSI<br />
• All layers are optimized for an embedded system’s<br />
particular digital implementation<br />
– Placing transistors<br />
– Sizing transistors<br />
– Routing wires<br />
• Benefits<br />
– Excellent performance, small size, low power<br />
• Drawbacks<br />
– High NRE cost (e.g., $300k), long time-to-market<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
23
Semi-custom<br />
• Lower layers are fully or partially built<br />
– Designers are left with routing of wires and maybe placing<br />
some blocks<br />
• Benefits<br />
– Good performance, good size, less NRE cost than a fullcustom<br />
implementation (perhaps $10k to $100k)<br />
• Drawbacks<br />
– Still require weeks to months to develop<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
24
PLD (Programmable Logic Device)<br />
• All layers already exist<br />
– Designers can purchase an IC<br />
– Connections on the IC are either created or destroyed to<br />
implement desired functionality<br />
– Field-Programmable Gate Array (FPGA) very popular<br />
• Benefits<br />
– Low NRE costs, almost instant IC availability<br />
• Drawbacks<br />
– Bigger, expensive (perhaps $30 per unit), power hungry,<br />
slower<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
25
Moore’s law<br />
• The most important trend in embedded systems<br />
– Predicted in 1965 by Intel co-founder Gordon Moore<br />
IC transistor capacity has doubled roughly every 18 months<br />
for the past several decades<br />
10,000<br />
1,000<br />
Logic transistors<br />
per chip<br />
(in millions)<br />
Note:<br />
logarithmic scale<br />
100<br />
10<br />
1<br />
0.1<br />
0.01<br />
0.001<br />
1981<br />
1983<br />
1985<br />
1987<br />
1989<br />
1991<br />
1993<br />
1995<br />
1997<br />
1999<br />
2001<br />
2003<br />
2005<br />
2007<br />
2009<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
26
Graphical illustration of Moore’s law<br />
1981 1984 1987 1990 1993 1996 1999 2002<br />
10,000<br />
transistors<br />
150,000,000<br />
transistors<br />
Leading edge<br />
chip in 1981<br />
Leading edge<br />
chip in 2002<br />
• Something that doubles frequently grows more quickly<br />
than most people realize!<br />
– A 2002 chip can hold about 15,000 1981 chips inside itself<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
27
Design Technology<br />
• The manner in which we convert our concept of<br />
desired system functionality into an implementation<br />
Compilation/<br />
Synthesis<br />
Libraries/<br />
IP<br />
Test/<br />
Verification<br />
Compilation/Synthesis:<br />
Automates exploration and<br />
insertion of implementation<br />
details for lower level.<br />
System<br />
specification<br />
System<br />
synthesis<br />
Hw/Sw/<br />
OS<br />
Model simulat./<br />
checkers<br />
Libraries/IP: Incorporates predesigned<br />
implementation from<br />
lower abstraction level into<br />
higher level.<br />
Behavioral<br />
specification<br />
RT<br />
specification<br />
Behavior<br />
synthesis<br />
RT<br />
synthesis<br />
Cores<br />
RT<br />
components<br />
Hw-Sw<br />
cosimulators<br />
HDL simulators<br />
Test/Verification: Ensures correct<br />
functionality at each level, thus<br />
reducing costly iterations<br />
between levels.<br />
Logic<br />
specification<br />
Logic<br />
synthesis<br />
Gates/<br />
Cells<br />
Gate<br />
simulators<br />
To final implementation<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
28
The co-design ladder<br />
• In the past:<br />
– <strong>Hardware</strong> and software<br />
design technologies were<br />
very different<br />
– Recent maturation of<br />
synthesis enables a unified<br />
view of hardware and<br />
software<br />
• <strong>Hardware</strong>/software<br />
“codesign”<br />
Compilers<br />
(1960's,1970's)<br />
Assembly instructions<br />
Assemblers, linkers<br />
(1950's, 1960's)<br />
Machine instructions<br />
Microprocessor plus<br />
program bits: “software”<br />
Sequential program code (e.g., C, VHDL)<br />
Implementation<br />
Behavioral synthesis<br />
(1990's)<br />
Register transfers<br />
RT synthesis<br />
(1980's, 1990's)<br />
Logic equations / FSM's<br />
Logic synthesis<br />
(1970's, 1980's)<br />
Logic gates<br />
VLSI, ASIC, or PLD<br />
implementation: “hardware”<br />
The choice of hardware versus software for a particular function is simply a tradeoff among various<br />
design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no<br />
fundamental difference between what hardware or software can implement.<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
29
Independence of processor and IC<br />
technologies<br />
• Basic tradeoff<br />
– General vs. custom<br />
– With respect to processor technology or IC technology<br />
– The two technologies are independent<br />
General,<br />
providing improved:<br />
Generalpurpose<br />
processor<br />
ASIP<br />
Singlepurpose<br />
processor<br />
Customized,<br />
providing improved:<br />
Flexibility<br />
Maintainability<br />
NRE cost<br />
Time- to-prototype<br />
Time-to-market<br />
Cost (low volume)<br />
Power efficiency<br />
Performance<br />
Size<br />
Cost (high volume)<br />
PLD<br />
Semi-custom<br />
Full-custom<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
30
Design productivity gap<br />
• 1981 leading edge chip required 100 designer months<br />
– 10,000 transistors / 100 transistors/month<br />
• 2002 leading edge chip requires 30,000 designer months<br />
– 150,000,000 / 5000 transistors/month<br />
• Designer cost increase from $1M to $300M<br />
10,000<br />
100,000<br />
1,000<br />
10,000<br />
Logic transistors<br />
per chip<br />
(in millions)<br />
100<br />
10<br />
1<br />
0.1<br />
0.01<br />
IC capacity<br />
productivity<br />
Gap<br />
1000<br />
100<br />
10<br />
1<br />
0.1<br />
Productivity<br />
(K) Trans./Staff-Mo.<br />
0.001<br />
0.01<br />
1981<br />
1983<br />
1985<br />
1987<br />
1989<br />
1991<br />
1993<br />
1995<br />
1997<br />
1999<br />
2001<br />
2003<br />
2005<br />
2007<br />
2009<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
31
The mythical man-month<br />
• The situation is even worse than the productivity gap indicates<br />
• In theory, adding designers to team reduces project completion time<br />
• In reality, productivity per designer decreases due to complexities of team management<br />
and communication<br />
• In the software community, known as “the mythical man-month” (Brooks 1975)<br />
• At some point, can actually lengthen project completion time! (“Too many cooks”)<br />
• 1M transistors, 1<br />
designer=5000 trans/month<br />
• Each additional designer<br />
reduces for 100 trans/month<br />
• So 2 designers produce 4900<br />
trans/month each<br />
60000<br />
50000<br />
40000<br />
30000<br />
20000<br />
10000<br />
43<br />
Team<br />
16<br />
15<br />
16<br />
19<br />
18<br />
24<br />
23<br />
Months until completion<br />
Individual<br />
0<br />
10 20 30 40<br />
Number of designers<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
32
Summary<br />
• Embedded systems are everywhere<br />
• Key challenge: optimization of design metrics<br />
– Design metrics compete with one another<br />
• A unified view of hardware and software is necessary to<br />
improve productivity<br />
• Three key technologies<br />
– Processor: general-purpose, application-specific, single-purpose<br />
– IC: Full-custom, semi-custom, PLD<br />
– Design: Compilation/synthesis, libraries/IP, test/verification<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
33
Ch2. SPP: <strong>Hardware</strong> Summary<br />
• Ch2: Custom Single-Purpose Processors (<strong>Hardware</strong>)<br />
– Straightforward design techniques<br />
– Can be built to execute algorithms<br />
– Typically start with FSMD<br />
– CAD tools can be of great assistance<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
34
Ch3. GPP: <strong>Software</strong> Summary<br />
• General-purpose processors (GPPs)<br />
– Good performance, low NRE, flexible<br />
• Controller, datapath, and memory<br />
• Structured languages prevail<br />
– But some assembly level programming still necessary<br />
• Many tools available<br />
– Including instruction-set simulators, and in-circuit emulators<br />
• ASIPs (Application-Specific Instruction-Set Processors)<br />
– Microcontrollers, DSPs, network processors, more customized ASIPs<br />
• Choosing among processors is an important step<br />
• Designing a general-purpose processor is conceptually the same<br />
as designing a single-purpose processor<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
35
Ch4, 5, 6. Summary<br />
• Ch4. Standard Single Purpose Processors: Peripherals<br />
– Timer, Counter, UART, LCD Controller, Stepper Motor Controller,<br />
ADC, RT Clock, etc.<br />
• Ch5. Memory<br />
– ROM: Mask-Programmable MP-ROM, One-Time Programmable OTP-<br />
ROM, EPROM, EEPROM, etc.<br />
– RAM: SRAM, DRAM, Pseudo-Static PSRAM, Nonvolatile NVRAM,<br />
Fast-Page Mode FPM-DRAM, Extended Data Out EDO-DRAM,<br />
Rambus RDRAM<br />
– Memory Hierarchy and Cache, MMU<br />
• Ch6. Interfacing<br />
– Memory-mapped I/O, Standard I/O, Interrupts, DMA, Bus Arbritation,<br />
I 2 C, FireWire, USB, PCI Bus, ARM Bus, IrDA, Bluetooth, 802.11x.<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
36
Ch7. Digital Camera Example Summary<br />
• Digital camera example<br />
– Specifications in English and executable language<br />
– Design metrics: performance, power and area<br />
• Several implementations<br />
– Microcontroller: too slow<br />
– Microcontroller and coprocessor: better, but still too slow<br />
– Fixed-point arithmetic: almost fast enough<br />
– Additional coprocessor for compression: fast enough, but<br />
expensive and hard to design<br />
– Tradeoffs between hw/sw – the main lesson of this book!<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
37
Ch8. State machine and Concurrent Process<br />
Models Summary<br />
• Computation models are distinct from languages<br />
• Sequential program model is popular<br />
– Most common languages like C support it directly<br />
• State machine models good for control<br />
– Extensions like HCFSM provide additional power<br />
– PSM combines state machines and sequential programs<br />
• Concurrent process model for multi-task systems<br />
– Communication and synchronization methods exist<br />
– Scheduling is critical<br />
• Dataflow model good for signal processing<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
38
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong><br />
(Ch9. Control Systems)<br />
(Ch10. IC Technology)<br />
<strong>Chapter</strong> 11: Design Technology<br />
39
Ch11. Design Technology<br />
• Automation: synthesis<br />
• Verification: hardware/software co-simulation<br />
• Reuse: intellectual property cores<br />
• Design process models<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
40
<strong>Introduction</strong><br />
• Design task<br />
– Define system functionality<br />
– Convert functionality to physical implementation while<br />
• Satisfying constrained metrics<br />
• Optimizing other design metrics<br />
• Designing embedded systems is hard<br />
– Complex functionality<br />
• Millions of possible environment scenarios<br />
• Competing, tightly constrained metrics<br />
– Productivity gap<br />
• As low as 10 lines of code or 100 transistors produced per day<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
41
Improving productivity<br />
• Design technologies developed to improve productivity<br />
• We focus on technologies advancing hardware/software unified<br />
view<br />
– Automation<br />
– Reuse<br />
• Program replaces manual design<br />
• Synthesis<br />
• Predesigned components<br />
• Cores<br />
• General-purpose and single-purpose processors on single IC<br />
– Verification<br />
Verification<br />
Automation<br />
• Ensuring correctness/completeness of each design step<br />
• <strong>Hardware</strong>/software co-simulation<br />
Specification<br />
Implementation<br />
Reuse<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
42
Automation: synthesis<br />
• Early design mostly hardware<br />
• <strong>Software</strong> complexity increased with advent<br />
of general-purpose processor<br />
• Different techniques for software design<br />
and hardware design<br />
– Caused division of the two fields<br />
• Design tools evolve for higher levels of<br />
abstraction<br />
– Different rate in each field<br />
• <strong>Hardware</strong>/software design fields rejoining<br />
– Both can start from behavioral description in<br />
sequential program model<br />
– 30 years longer for hardware design to reach<br />
this step in the ladder<br />
• Many more design dimensions<br />
• Optimization critical<br />
Compilers<br />
(1960s,1970s)<br />
Assembly instructions<br />
Assemblers, linkers<br />
(1950s, 1960s)<br />
Machine instructions<br />
Microprocessor plus<br />
program bits<br />
The codesign ladder<br />
Sequential program code (e.g., C, VHDL)<br />
Implementation<br />
Behavioral synthesis<br />
(1990s)<br />
Register transfers<br />
RT synthesis<br />
(1980s, 1990s)<br />
Logic equations / FSM's<br />
Logic synthesis<br />
(1970s, 1980s)<br />
Logic gates<br />
VLSI, ASIC, or PLD<br />
implementation<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
43
<strong>Hardware</strong>/software parallel evolution<br />
• <strong>Software</strong> design evolution<br />
– Machine instructions<br />
– Assemblers<br />
• convert assembly programs into machine<br />
instructions<br />
– Compilers<br />
• translate sequential programs into assembly<br />
• <strong>Hardware</strong> design evolution<br />
– Interconnected logic gates<br />
– Logic synthesis<br />
• converts logic equations or FSMs into gates<br />
– Register-transfer (RT) synthesis<br />
• converts FSMDs into FSMs, logic equations,<br />
predesigned RT components (registers,<br />
adders, etc.)<br />
– Behavioral synthesis<br />
• converts sequential programs into FSMDs<br />
Compilers<br />
(1960s,1970s)<br />
Assembly instructions<br />
Assemblers, linkers<br />
(1950s, 1960s)<br />
Machine instructions<br />
Microprocessor plus<br />
program bits<br />
The codesign ladder<br />
Sequential program code (e.g., C, VHDL)<br />
Implementation<br />
Behavioral synthesis<br />
(1990s)<br />
Register transfers<br />
RT synthesis<br />
(1980s, 1990s)<br />
Logic equations / FSM's<br />
Logic synthesis<br />
(1970s, 1980s)<br />
Logic gates<br />
VLSI, ASIC, or PLD<br />
implementation<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
44
Increasing abstraction level<br />
• Higher abstraction level focus of hardware/software design evolution<br />
– Description smaller/easier to capture<br />
• E.g., Line of sequential program code can translate to 1000 gates<br />
– Many more possible implementations available<br />
• (a) Like flashlight, the higher above the ground, the more ground illuminated<br />
– Sequential program designs may differ in performance/transistor count by orders of magnitude<br />
– Logic-level designs may differ by only power of 2<br />
• (b) Design process proceeds to lower abstraction level, narrowing in on single<br />
implementation<br />
modeling cost increases<br />
opportunities decrease<br />
idea<br />
implementation<br />
(a)<br />
back-of-the-envelope<br />
sequential program<br />
register-transfers<br />
logic<br />
idea<br />
implementation<br />
(b)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
45
Synthesis<br />
• Automatically converting system’s behavioral description to a structural<br />
implementation<br />
– Complex whole formed by parts<br />
– Structural implementation must optimize design metrics<br />
• More expensive, complex than compilers<br />
– Cost = $100s to $10,000s<br />
– User controls 100s of synthesis options<br />
– Optimization critical<br />
• Otherwise could use software<br />
– Optimizations different for each user<br />
– Run time = hours, days<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
46
Gajski’s Y-chart<br />
• Each axis represents type of description<br />
– Behavioral<br />
• Defines outputs as function of inputs<br />
• Algorithms but no implementation<br />
– Structural<br />
• Implements behavior by connecting<br />
components with known behavior<br />
– Physical<br />
• Gives size/locations of components and<br />
wires on chip/board<br />
• Synthesis converts behavior at given level<br />
to structure at same level or lower<br />
– E.g.,<br />
• FSM → gates, flip-flops (same level)<br />
• FSM → transistors (lower level)<br />
• FSM X registers, FUs (higher level)<br />
• FSM X processors, memories (higher level)<br />
Processors, memories<br />
Structural<br />
Registers, FUs, MUXs<br />
Gates, flip-flops<br />
Transistors<br />
Behavior<br />
Sequential programs<br />
Register transfers<br />
Logic equations/FSM<br />
Transfer functions<br />
Cell Layout<br />
Modules<br />
Chips<br />
Boards<br />
Physical<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
47
Logic synthesis<br />
• Logic-level behavior to structural implementation<br />
– Logic equations and/or FSM to connected gates<br />
• Combinational logic synthesis<br />
– Two-level minimization (Sum of products/product of sums)<br />
• Best possible performance<br />
– Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)<br />
• Minimize size<br />
– Minimum cover<br />
– Minimum cover that is prime<br />
– Heuristics<br />
– Multilevel minimization<br />
• Trade performance for size<br />
• Pareto-optimal solution<br />
– Heuristics<br />
• FSM synthesis<br />
– State minimization<br />
– State encoding<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
48
Two-level minimization<br />
• Represent logic function as sum of<br />
products (or product of sums)<br />
– AND gate for each product<br />
– OR gate for each sum<br />
• Gives best possible performance<br />
– At most 2 gate delay<br />
• Goal: minimize size<br />
– Minimum cover<br />
• Minimum # of AND gates (sum of products)<br />
– Minimum cover that is prime<br />
• Minimum # of inputs to each AND gate (sum<br />
of products)<br />
Sum of products<br />
F = abc'd' + a'b'cd + a'bcd + ab'cd<br />
Direct implementation<br />
a<br />
b<br />
c<br />
d<br />
4 4-input AND gates and<br />
1 4-input OR gate<br />
→ 40 transistors<br />
F<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
49
Minimum cover<br />
• Minimum # of AND gates (sum of products)<br />
• Literal: variable or its complement<br />
– a or a’, b or b’, etc.<br />
• Minterm: product of literals<br />
– Each literal appears exactly once<br />
• abc’d’, ab’cd, a’bcd, etc.<br />
• Implicant: product of literals<br />
– Each literal appears no more than once<br />
• abc’d’, a’cd, etc.<br />
– Covers 1 or more minterms<br />
• a’cd covers a’bcd and a’b’cd<br />
• Cover: set of implicants that covers all minterms of function<br />
• Minimum cover: cover with minimum # of implicants<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
50
Minimum cover: K-map approach<br />
• Karnaugh map (K-map)<br />
– 1 represents minterm<br />
– Circle represents implicant<br />
• Minimum cover<br />
– Covering all 1’s with min # of<br />
circles<br />
– Example: direct vs. min cover<br />
• Less gates<br />
– 4 vs. 5<br />
• Less transistors<br />
– 28 vs. 40<br />
K-map: sum of products<br />
ab cd<br />
00 01 11 10<br />
a<br />
b<br />
c<br />
d<br />
00<br />
01<br />
11<br />
10<br />
0 0 1 0<br />
0 0 1 0<br />
1 0 0 0<br />
0 0 1 0<br />
Minimum cover<br />
K-map: minimum cover<br />
ab cd<br />
00 01 11 10<br />
00<br />
01<br />
11<br />
10<br />
F=abc'd' + a'cd + ab'cd<br />
Minimum cover implementation<br />
F<br />
0 0 1 0<br />
0 0 1 0<br />
1 0 0 0<br />
0 0 1 0<br />
2 4-input AND gate<br />
1 3-input AND gates<br />
1 4 input OR gate<br />
→ 28 transistors<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
51
Minimum cover that is prime<br />
• Minimum # of inputs to AND gates<br />
• Prime implicant<br />
– Implicant not covered by any other<br />
implicant<br />
– Max-sized circle in K-map<br />
• Minimum cover that is prime<br />
– Covering with min # of prime implicants<br />
– Min # of max-sized circles<br />
– Example: prime cover vs. min cover<br />
• Same # of gates<br />
– 4 vs. 4<br />
• Less transistors<br />
– 26 vs. 28<br />
a<br />
b<br />
c<br />
d<br />
K-map: minimum cover that is prime<br />
ab cd<br />
00 01 11 10<br />
00<br />
01<br />
11<br />
10<br />
0 0 1 0<br />
0 0 1 0<br />
1 0 0 0<br />
0 0 1 0<br />
Minimum cover that is prime<br />
F=abc'd' + a'cd + b'cd<br />
Implementation<br />
F<br />
1 4-input AND gate<br />
2 3-input AND gates<br />
1 4 input OR gate<br />
→ 26 transistors<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
52
Minimum cover: heuristics<br />
• K-maps give optimal solution every time<br />
– Functions with > 6 inputs too complicated<br />
– Use computer-based tabular method<br />
• Heuristic<br />
• Finds all prime implicants<br />
• Finds min cover that is prime<br />
• Also optimal solution every time<br />
• Problem: 2 n minterms for n inputs<br />
– 32 inputs = 4 billion minterms<br />
– Exponential complexity<br />
– Solution technique where optimal solution not guaranteed<br />
– Hopefully comes close<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
53
Multilevel logic minimization<br />
• Trade performance for size<br />
– Increase delay for lower # of gates<br />
– Gray area represents all possible<br />
solutions<br />
– Circle with X represents ideal solution<br />
• Generally not possible<br />
– 2-level gives best performance<br />
• max delay = 2 gates<br />
• Solve for smallest size<br />
– Multilevel gives pareto-optimal solution<br />
• Minimum delay for a given size<br />
• Minimum size for a given delay<br />
delay<br />
multi-level minim.<br />
2-level minim.<br />
size<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
54
Example<br />
• Minimized 2-level logic function:<br />
– F = adef + bdef + cdef + gh<br />
– Requires 5 gates with 18 total gate inputs<br />
• 4 ANDS and 1 OR<br />
• After algebraic manipulation:<br />
– F = (a + b + c)def + gh<br />
– Requires only 4 gates with 11 total gate inputs<br />
• 2 ANDS and 2 ORs<br />
– Less inputs per gate<br />
– Assume gate inputs = 2 transistors<br />
• Reduced by 14 transistors<br />
– 36 (18 * 2) down to 22 (11 * 2)<br />
– Sacrifices performance for size<br />
• Inputs a, b, and c now have 3-gate delay<br />
• Iterative improvement heuristic commonly used<br />
a<br />
d<br />
b<br />
e<br />
c<br />
f<br />
g<br />
h<br />
2-level minimized<br />
multilevel minimized<br />
a<br />
b<br />
c<br />
d<br />
e<br />
f<br />
g<br />
F<br />
h<br />
F<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
55
FSM synthesis<br />
• FSM to gates<br />
• State minimization<br />
– Reduce # of states<br />
• Identify and merge equivalent states<br />
– Outputs, next states same for all possible inputs<br />
– Tabular method gives exact solution<br />
• Table of all possible state pairs<br />
• If n states, n 2 table entries<br />
• Thus, heuristics used with large # of states<br />
• State encoding<br />
– Unique bit sequence for each state<br />
– If n states, log 2 (n) bits<br />
– n! possible encodings<br />
– Thus, heuristics common<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
56
Technology mapping<br />
• Library of gates available for implementation<br />
– Simple<br />
• only 2-input AND,OR gates<br />
– Complex<br />
• various-input AND, OR, NAND, NOR, etc. gates<br />
• Efficiently implemented meta-gates (i.e., AND-OR-INVERT, MUX)<br />
• Final structure consists of specified library’s components only<br />
• If technology mapping integrated with logic synthesis<br />
– More efficient circuit<br />
– More complex problem<br />
– Heuristics required<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
57
Complexity impact on user<br />
• As complexity grows, heuristics used<br />
• Heuristics differ tremendously among synthesis tools<br />
– Computationally expensive<br />
• Higher quality results<br />
• Variable optimization effort settings<br />
• Long run times (hours, days)<br />
• Requires huge amounts of memory<br />
• Typically needs to run on servers, workstations<br />
– Fast heuristics<br />
• Lower quality results<br />
• Shorter run times (minutes, hours)<br />
• Smaller amount of memory required<br />
• Could run on PC<br />
• Super-linear-time (i.e. n 3 ) heuristics usually used<br />
– User can partition large systems to reduce run times/size<br />
– 100 3 > 50 3 + 50 3 (1,000,000 > 250,000)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
58
Integrating logic design and physical design<br />
• Past<br />
– Gate delay much greater than wire delay<br />
– Thus, performance evaluated as # of levels<br />
of gates only<br />
• Today<br />
– Gate delay shrinking as feature size<br />
shrinking<br />
– Wire delay increasing<br />
• Performance evaluation needs wire length<br />
– Transistor placement (needed for wire<br />
length) domain of physical design<br />
– Thus, simultaneous logic synthesis and<br />
physical design required for efficient<br />
circuits<br />
Delay<br />
Reduced feature size<br />
Wire<br />
Transistor<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
59
Register-transfer synthesis<br />
• Converts FSMD to custom single-purpose processor<br />
– Datapath<br />
• Register units to store variables<br />
– Complex data types<br />
• Functional units<br />
– Arithmetic operations<br />
• Connection units<br />
– Buses, MUXs<br />
– FSM controller<br />
• Controls datapath<br />
– Key sub problems:<br />
• Allocation<br />
– Instantiate storage, functional, connection units<br />
• Binding<br />
– Mapping FSMD operations to specific units<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
60
Behavioral synthesis<br />
• High-level synthesis<br />
• Converts single sequential program to single-purpose processor<br />
– Does not require the program to schedule states<br />
• Key sub problems<br />
– Allocation<br />
– Binding<br />
– Scheduling<br />
• Assign sequential program’s operations to states<br />
• Conversion template given in Ch. 2<br />
• Optimizations important<br />
– Compiler<br />
• Constant propagation, dead-code elimination, loop unrolling<br />
– Advanced techniques for allocation, binding, scheduling<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
61
System synthesis<br />
• Convert 1 or more processes into 1 or more processors (system)<br />
– For complex embedded systems<br />
• Multiple processes may provide better performance/power<br />
• May be better described using concurrent sequential programs<br />
• Tasks<br />
– Transformation<br />
• Can merge 2 exclusive processes into 1 process<br />
• Can break 1 large process into separate processes<br />
• Procedure inlining<br />
• Loop unrolling<br />
– Allocation<br />
• Essentially design of system architecture<br />
– Select processors to implement processes<br />
– Also select memories and busses<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
62
System synthesis<br />
• Tasks (cont.)<br />
– Partitioning<br />
• Mapping 1 or more processes to 1 or more processors<br />
• Variables among memories<br />
• Communications among buses<br />
– Scheduling<br />
• Multiple processes on a single processor<br />
• Memory accesses<br />
• Bus communications<br />
– Tasks performed in variety of orders<br />
– Iteration among tasks common<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
63
System synthesis<br />
• Synthesis driven by constraints<br />
– E.g.,<br />
• Meet performance requirements at minimum cost<br />
– Allocate as much behavior as possible to general-purpose processor<br />
• Low-cost/flexible implementation<br />
– Minimum # of SPPs used to meet performance<br />
• System synthesis for GPP only (software)<br />
– Common for decades<br />
• Multiprocessing<br />
• Parallel processing<br />
• Real-time scheduling<br />
• <strong>Hardware</strong>/software codesign<br />
– Simultaneous consideration of GPPs/SPPs during synthesis<br />
– Made possible by maturation of behavioral synthesis in 1990’s<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
64
Temporal vs. spatial thinking<br />
• Design thought process changed by evolution of synthesis<br />
• Before synthesis<br />
– Designers worked primarily in structural domain<br />
• Connecting simpler components to build more complex systems<br />
– Connecting logic gates to build controller<br />
– Connecting registers, MUXs, ALUs to build datapath<br />
– “capture and simulate” era<br />
• Capture using CAD tools<br />
• Simulate to verify correctness before fabricating<br />
– Spatial thinking<br />
• Structural diagrams<br />
• Data sheets<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
65
Temporal vs. spatial thinking<br />
• After synthesis<br />
– “describe-and-synthesize” era<br />
– Designers work primarily in behavioral domain<br />
– “describe and synthesize” era<br />
• Describe FSMDs or sequential programs<br />
• Synthesize into structure<br />
– Temporal thinking<br />
• States or sequential statements have relationship over time<br />
• Strong understanding of hardware structure still important<br />
– Behavioral description must synthesize to efficient structural<br />
implementation<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
66
Verification<br />
• Ensuring design is correct and complete<br />
– Correct<br />
• Implements specification accurately<br />
– Complete<br />
• Describes appropriate output to all relevant input<br />
• Formal verification<br />
– Hard<br />
– For small designs or verifying certain key properties only<br />
• Simulation<br />
– Most common verification method<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
67
Formal verification<br />
• Analyze design to prove or disprove certain properties<br />
• Correctness example<br />
– Prove ALU structural implementation equivalent to behavioral<br />
description<br />
• Derive Boolean equations for outputs<br />
• Create truth table for equations<br />
• Compare to truth table from original behavior<br />
• Completeness example<br />
– Formally prove elevator door can never open while elevator is moving<br />
• Derive conditions for door being open<br />
• Show conditions conflict with conditions for elevator moving<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
68
Simulation<br />
• Create computer model of design<br />
– Provide sample input<br />
– Check for acceptable output<br />
• Correctness example<br />
– ALU<br />
• Provide all possible input combinations<br />
• Check outputs for correct results<br />
• Completeness example<br />
– Elevator door closed when moving<br />
• Provide all possible input sequences<br />
• Check door always closed when elevator moving<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
69
Increases confidence<br />
• Simulating all possible input sequences impossible for most<br />
systems<br />
– E.g., 32-bit ALU<br />
• 2 32 * 2 32 = 2 64 possible input combinations<br />
• At 1 million combinations/sec<br />
• ½ million years to simulate<br />
• Sequential circuits even worse<br />
• Can only simulate tiny subset of possible inputs<br />
– Typical values<br />
– Known boundary conditions<br />
• E.g., 32-bit ALU<br />
– Both operands all 0’s<br />
– Both operands all 1’s<br />
• Increases confidence of correctness/completeness<br />
• Does not prove<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
70
Advantages over physical implementation<br />
• Controllability<br />
– Control time<br />
• Stop/start simulation at any time<br />
– Control data values<br />
• Inputs or internal values<br />
• Observability<br />
– Examine system/environment values at any time<br />
• Debugging<br />
– Can stop simulation at any point and:<br />
• Observe internal values<br />
• Modify system/environment values before restarting<br />
– Can step through small intervals (i.e., 500 nanoseconds)<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
71
Disadvantages<br />
• Simulation setup time<br />
– Often has complex external environments<br />
– Could spend more time modeling environment than system<br />
• Models likely incomplete<br />
– Some environment behavior undocumented if complex environment<br />
– May not model behavior correctly<br />
• Simulation speed much slower than actual execution<br />
– Sequentializing parallel design<br />
• IC: gates operate in parallel<br />
• Simulation: analyze inputs, generate outputs for each gate 1 at time<br />
– Several programs added between simulated system and real hardware<br />
• 1 simulated operation:<br />
– = 10 to 100 simulator operations<br />
– = 100 to 10,000 operating system operations<br />
– = 1,000 to 100,000 hardware operations<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
72
Simulation speed<br />
• Relative speeds of different types of<br />
simulation/emulation<br />
– 1 hour actual execution of SOC<br />
• = 1.2 years instruction-set simulation<br />
• = 10,000,000 hours gate-level simulation<br />
1 IC 1 hour<br />
×10 FPGA 1 day<br />
×100<br />
×1000<br />
hardware emulation<br />
throughput model<br />
4 days<br />
1.4 months<br />
×10000<br />
×100,000<br />
instruction-set simulation<br />
cycle-accurate simulation<br />
1.2 years<br />
12 years<br />
×1,000,000<br />
×10,000,000<br />
register-transfer-level HDL simulation<br />
gate-level HDL simulation<br />
>1 lifetime<br />
1<br />
millennium<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
73
Overcoming long simulation time<br />
• Reduce amount of real time simulated<br />
– 1 msec execution instead of 1 hour<br />
• 0.001sec * 10,000,000 = 10,000 sec = 3 hours<br />
– Reduced confidence<br />
• 1 msec of cruise controller operation tells us little<br />
• Faster simulator<br />
– Emulators<br />
• Special hardware for simulations<br />
– Less precise/accurate simulators<br />
• Exchange speed for observability/controllability<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
74
Reducing precision/accuracy<br />
• Don’t need gate-level analysis for all simulations<br />
– E.g., cruise control<br />
• Don’t care what happens at every input/output of each logic gate<br />
– Simulating RT components ~10x faster<br />
– Cycle-based simulation ~100x faster<br />
• Accurate at clock boundaries only<br />
• No information on signal changes between boundaries<br />
• Faster simulator often combined with reduction in real time<br />
– If willing to simulate for 10 hours<br />
• Use instruction-set simulator<br />
• Real execution time simulated<br />
– 10 hours * 1 / 10,000<br />
– = 0.001 hour<br />
– = 3.6 seconds<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
75
<strong>Hardware</strong>/software co-simulation<br />
• Variety of simulation approaches exist<br />
– From very detailed<br />
• E.g., gate-level model<br />
– To very abstract<br />
• E.g., instruction-level model<br />
• Simulation tools evolved separately for hardware/software<br />
– Recall separate design evolution<br />
– <strong>Software</strong> (GPP)<br />
• Typically with instruction-set simulator (ISS)<br />
– <strong>Hardware</strong> (SPP)<br />
• Typically with models in HDL environment<br />
• Integration of GPP/SPP on single IC creating need for merging<br />
simulation tools<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
76
Integrating GPP/SPP simulations<br />
• Simple/naïve way<br />
– HDL model of microprocessor<br />
• Runs system software<br />
• Much slower than ISS<br />
• Less observable/controllable than ISS<br />
– HDL models of SPPs<br />
– Integrate all models<br />
• <strong>Hardware</strong>-software co-simulator<br />
– ISS for microprocessor<br />
– HDL model for SPPs<br />
– Create communication between simulators<br />
– Simulators run separately except when transferring data<br />
– Faster<br />
– Though, frequent communication between ISS and HDL model slows it down<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
77
Minimizing communication<br />
• Memory shared between GPP and SPPs<br />
– Where should memory go?<br />
– In ISS<br />
• HDL simulator must stall for memory access<br />
– In HDL?<br />
• ISS must stall when fetching each instruction<br />
• Model memory in both ISS and HDL<br />
– Most accesses by each model unrelated to other’s accesses<br />
• No need to communicate these between models<br />
– Co-simulator ensures consistency of shared data<br />
– Huge speedups (100x or more) reported with this technique<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
78
Emulators<br />
• General physical device system mapped to<br />
– Microprocessor emulator<br />
• Microprocessor IC with some monitoring, control circuitry<br />
– SPP emulator<br />
• FPGAs (10s to 100s)<br />
– Usually supports debugging tasks<br />
• Created to help solve simulation disadvantages<br />
– Mapped relatively quickly<br />
• Hours, days<br />
– Can be placed in real environment<br />
• No environment setup time<br />
• No incomplete environment<br />
– Typically faster than simulation<br />
• <strong>Hardware</strong> implementation<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
79
Disadvantages<br />
• Still not as fast as real implementations<br />
– E.g., emulated cruise-control may not respond fast enough to<br />
keep control of car<br />
• Mapping still time consuming<br />
– E.g., mapping complex SOC to 10 FPGAs<br />
• Just partitioning into 10 parts could take weeks<br />
• Can be very expensive<br />
– Top-of-the-line FPGA-based emulator: $100,000 to $1mill<br />
– Leads to resource bottleneck<br />
• Can maybe only afford 1 emulator<br />
• Groups wait days, weeks for other group to finish using<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
80
Reuse: intellectual property cores<br />
• Commercial off-the-shelf (COTS) components<br />
– Predesigned, prepackaged ICs<br />
– Implements GPP or SPP<br />
– Reduces design/debug time<br />
– Have always been available<br />
• System-on-a-chip (SOC)<br />
– All components of system implemented on single chip<br />
– Made possible by increasing IC capacities<br />
– Changing the way COTS components sold<br />
• As intellectual property (IP) rather than actual IC<br />
– Behavioral, structural, or physical descriptions<br />
– Processor-level components known as cores<br />
• SOC built by integrating multiple descriptions<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
81
Cores<br />
• Soft core<br />
– Synthesizable behavioral<br />
description<br />
– Typically written in HDL<br />
(VHDL/Verilog)<br />
• Firm core<br />
– Structural description<br />
– Typically provided in HDL<br />
• Hard core<br />
– Physical description<br />
– Provided in variety of physical<br />
layout file formats<br />
Processors, memories<br />
Structural<br />
Registers, FUs, MUXs<br />
Gates, flip-flops<br />
Transistors<br />
Gajski’s Y-chart<br />
Physical<br />
Behavior<br />
Sequential programs<br />
Register transfers<br />
Logic equations/FSM<br />
Transfer functions<br />
Cell Layout<br />
Modules<br />
Chips<br />
Boards<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
82
Advantages/disadvantages of hard core<br />
• Ease of use<br />
– Developer already designed and tested core<br />
• Can use right away<br />
• Can expect to work correctly<br />
• Predictability<br />
– Size, power, performance predicted accurately<br />
• Not easily mapped (retargeted) to different process<br />
– E.g., core available for vendor X’s 0.25 micrometer CMOS<br />
process<br />
• Can’t use with vendor X’s 0.18 micrometer process<br />
• Can’t use with vendor Y<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
83
Advantages/disadvantages of soft/firm cores<br />
• Soft cores<br />
– Can be synthesized to nearly any technology<br />
– Can optimize for particular use<br />
• E.g., delete unused portion of core<br />
– Lower power, smaller designs<br />
– Requires more design effort<br />
– May not work in technology not tested for<br />
– Not as optimized as hard core for same processor<br />
• Firm cores<br />
– Compromise between hard and soft cores<br />
• Some retargetability<br />
• Limited optimization<br />
• Better predictability/ease of use<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
84
New challenges to processor providers<br />
• Cores have dramatically changed business model<br />
– Pricing models<br />
• Past<br />
– Vendors sold product as IC to designers<br />
– Designers must buy any additional copies<br />
• Could not (economically) copy from original<br />
• Today<br />
– Vendors can sell as IP<br />
– Designers can make as many copies as needed<br />
• Vendor can use different pricing models<br />
– Royalty-based model<br />
• Similar to old IC model<br />
• Designer pays for each additional model<br />
– Fixed price model<br />
• One price for IP and as many copies as needed<br />
– Many other models used<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
85
IP protection<br />
• Past<br />
– Illegally copying IC very difficult<br />
• Reverse engineering required tremendous, deliberate effort<br />
• “Accidental” copying not possible<br />
• Today<br />
– Cores sold in electronic format<br />
• Deliberate/accidental unauthorized copying easier<br />
• Safeguards greatly increased<br />
• Contracts to ensure no copying/distributing<br />
• Encryption techniques<br />
– limit actual exposure to IP<br />
• Watermarking<br />
– determines if particular instance of processor was copied<br />
– whether copy authorized<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
86
New challenges to processor users<br />
• Licensing arrangements<br />
– Not as easy as purchasing IC<br />
– More contracts enforcing pricing model and IP protection<br />
• Possibly requiring legal assistance<br />
• Extra design effort<br />
– Especially for soft cores<br />
• Must still be synthesized and tested<br />
• Minor differences in synthesis tools can cause problems<br />
• Verification requirements more difficult<br />
– Extensive testing for synthesized soft cores and soft/firm cores mapped to particular<br />
technology<br />
• Ensure correct synthesis<br />
• Timing and power vary between implementations<br />
– Early verification critical<br />
• Cores buried within IC<br />
• Cannot simply replace bad core<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
87
Design process model<br />
• Describes order that design steps are processed<br />
– Behavior description step<br />
– Behavior to structure conversion step<br />
– Mapping structure to physical implementation<br />
step<br />
• Waterfall model<br />
– Proceed to next step only after current step<br />
completed<br />
• Spiral model<br />
– Proceed through 3 steps in order but with less<br />
detail<br />
– Repeat 3 steps gradually increasing detail<br />
– Keep repeating until desired system obtained<br />
– Becoming extremely popular (hardware &<br />
software development)<br />
Waterfall design model<br />
Behavioral<br />
Structural<br />
Physical<br />
Spiral design model<br />
Structural<br />
Behavioral<br />
Physical<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
88
Waterfall method<br />
• Not very realistic<br />
– Bugs often found in later steps that must be fixed in<br />
earlier step<br />
• E.g., forgot to handle certain input condition<br />
– Prototype often needed to know complete desired<br />
behavior<br />
• E.g, customer adds features after product demo<br />
– System specifications commonly change<br />
• E.g., to remain competitive by reducing power, size<br />
– Certain features dropped<br />
• Unexpected iterations back through 3 steps<br />
cause missed deadlines<br />
– Lost revenues<br />
– May never make it to market<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
Waterfall design model<br />
Behavioral<br />
Structural<br />
Physical<br />
89
Spiral method<br />
• First iteration of 3 steps incomplete<br />
• Much faster, though<br />
– End up with prototype<br />
• Use to test basic functions<br />
• Get idea of functions to add/remove<br />
– Original iteration experience helps in following<br />
iterations of 3 steps<br />
• Must come up with ways to obtain structure and<br />
physical implementations quickly<br />
– E.g., FPGAs for prototype<br />
• silicon for final product<br />
– May have to use more tools<br />
• Extra effort/cost<br />
• Could require more time than waterfall method<br />
– If correct implementation first time with waterfall<br />
Spiral design model<br />
Structural<br />
Behavioral<br />
Physical<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
90
General-purpose processor design models<br />
• Previous slides focused on SPPs<br />
• Can apply equally to GPPs<br />
– Waterfall model<br />
• Structure developed by particular company<br />
• Acquired by embedded system designer<br />
• Designer develops software (behavior)<br />
• Designer maps application to architecture<br />
– Compilation<br />
– Manual design<br />
– Spiral-like model<br />
• Beginning to be applied by embedded system designers<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
91
Spiral-like model<br />
• Designer develops or acquires architecture<br />
• Develops application(s)<br />
• Maps application to architecture<br />
• Analyzes design metrics<br />
• Now makes choice<br />
– Modify mapping<br />
– Modify application(s) to better suit architecture<br />
– Modify architecture to better suit application(s)<br />
• Not as difficult now<br />
– Maturation of synthesis/compilers<br />
– IPs can be tuned<br />
• Continue refining to lower abstraction level until<br />
particular implementation chosen<br />
Y-chart<br />
Architecture Application(s)<br />
Mapping<br />
Analysis<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
92
Summary<br />
• Design technology seeks to reduce gap between IC<br />
capacity growth and designer productivity growth<br />
• Synthesis has changed digital design<br />
• Increased IC capacity means sw/hw components<br />
coexist on one chip<br />
• Design paradigm shift to core-based design<br />
• Simulation essential but hard<br />
• Spiral design process is popular<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
93
Book Summary<br />
• Embedded systems are common and growing<br />
– Such systems are very different from in the past due to increased IC capacities and<br />
automation tools<br />
– Indicator: National Science Foundation just created a separate program on Embedded<br />
Systems (2002).<br />
• New view:<br />
– Embedded computing systems are built from a collection of processors, some generalpurpose<br />
(sw), some single-purpose (hw)<br />
– Hw/sw differ in design metrics, not in some fundamental way<br />
– Memory and interfaces necessary to complete system<br />
– Days of embedded system design as assembly-level programming of one microprocessor<br />
are fading away<br />
• Need to focus on higher-level issues<br />
– State machines, concurrent processes, control systems<br />
– IC technologies, design technologies<br />
• There’s a growing, challenging and exciting world of embedded systems design out<br />
there. There’s also much more to learn. Enjoy!<br />
Embedded Systems Design: A Unified<br />
<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />
94