25.09.2015 Views

Hardware/Software Introduction Chapter 1 Introduction

Embedded Systems Design: A Unified Hardware/Software ... - EET

Embedded Systems Design: A Unified Hardware/Software ... - EET

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong><br />

<strong>Chapter</strong> 1: <strong>Introduction</strong><br />

1


Outline<br />

• Embedded systems overview<br />

– What are they?<br />

• Design challenge – optimizing design metrics<br />

• Technologies<br />

– Processor technologies<br />

– IC technologies<br />

– Design technologies<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

2


Embedded systems overview<br />

• Embedded computing systems<br />

– Computing systems embedded within<br />

electronic devices<br />

– Hard to define. Nearly any computing<br />

system other than a desktop computer<br />

– Billions of units produced yearly, versus<br />

millions of desktop units<br />

– Perhaps 50 per household and per<br />

automobile<br />

Computers are in here...<br />

and here...<br />

and even here...<br />

Lots more of these,<br />

though they cost a lot<br />

less each.<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

3


A “short list” of embedded systems<br />

Anti-lock brakes<br />

Auto-focus cameras<br />

Automatic teller machines<br />

Automatic toll systems<br />

Automatic transmission<br />

Avionic systems<br />

Battery chargers<br />

Camcorders<br />

Cell phones<br />

Cell-phone base stations<br />

Cordless phones<br />

Cruise control<br />

Curbside check-in systems<br />

Digital cameras<br />

Disk drives<br />

Electronic card readers<br />

Electronic instruments<br />

Electronic toys/games<br />

Factory control<br />

Fax machines<br />

Fingerprint identifiers<br />

Home security systems<br />

Life-support systems<br />

Medical testing systems<br />

Modems<br />

MPEG decoders<br />

Network cards<br />

Network switches/routers<br />

On-board navigation<br />

Pagers<br />

Photocopiers<br />

Point-of-sale systems<br />

Portable video games<br />

Printers<br />

Satellite phones<br />

Scanners<br />

Smart ovens/dishwashers<br />

Speech recognizers<br />

Stereo systems<br />

Teleconferencing systems<br />

Televisions<br />

Temperature controllers<br />

Theft tracking systems<br />

TV set-top boxes<br />

VCR’s, DVD players<br />

Video game consoles<br />

Video phones<br />

Washers and dryers<br />

And the list goes on and on<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

4


Some common characteristics of embedded<br />

systems<br />

• Single-functioned<br />

– Executes a single program, repeatedly<br />

• Tightly-constrained<br />

– Low cost, low power, small, fast, etc.<br />

• Reactive and real-time<br />

– Continually reacts to changes in the system’s environment<br />

– Must compute certain results in real-time without delay<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

5


Embedded system example -- digital camera<br />

CCD<br />

Digital camera chip<br />

A2D<br />

CCD preprocessor<br />

Pixel coprocessor<br />

D2A<br />

lens<br />

JPEG codec<br />

Microcontroller<br />

Multiplier/Accum<br />

DMA controller<br />

Display ctrl<br />

Memory controller ISA bus interface UART LCD ctrl<br />

• Single-functioned -- always a digital camera<br />

• Tightly-constrained -- Low cost, low power, small, fast<br />

• Reactive and real-time -- only to a small extent<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

6


Design challenge – optimizing design metrics<br />

• Obvious design goal:<br />

– Construct an implementation with desired functionality<br />

• Key design challenge:<br />

– Simultaneously optimize numerous design metrics<br />

• Design metric<br />

– A measurable feature of a system’s implementation<br />

– Optimizing design metrics is a key challenge<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

7


Design challenge – optimizing design metrics<br />

• Common metrics<br />

– Unit cost: the monetary cost of manufacturing each copy of the system,<br />

excluding NRE cost<br />

– NRE cost (Non-Recurring Engineering cost): The one-time<br />

monetary cost of designing the system<br />

–Size: the physical space required by the system<br />

– Performance: the execution time or throughput of the system<br />

– Power: the amount of power consumed by the system<br />

– Flexibility: the ability to change the functionality of the system without<br />

incurring heavy NRE cost<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

8


Design challenge – optimizing design metrics<br />

• Common metrics (continued)<br />

– Time-to-prototype: the time needed to build a working version of the<br />

system<br />

– Time-to-market: the time required to develop a system to the point that it<br />

can be released and sold to customers<br />

– Maintainability: the ability to modify the system after its initial release<br />

– Correctness, safety, many more<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

9


Design metric competition -- improving one<br />

may worsen others<br />

lens<br />

CCD<br />

Performance<br />

Digital camera chip<br />

A2D<br />

JPEG codec<br />

DMA controller<br />

CCD preprocessor<br />

Power<br />

NRE cost<br />

Microcontroller<br />

Pixel coprocessor<br />

Size<br />

D2A<br />

Multiplier/Accum<br />

Display ctrl<br />

• Expertise with both software<br />

and hardware is needed to<br />

optimize design metrics<br />

– Not just a hardware or<br />

software expert, as is common<br />

– A designer must be<br />

comfortable with various<br />

technologies in order to choose<br />

the best for a given application<br />

and constraints<br />

<strong>Hardware</strong><br />

Memory controller ISA bus interface UART LCD ctrl<br />

<strong>Software</strong><br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

10


Time-to-market: a demanding design metric<br />

Revenues ($)<br />

Time (months)<br />

• Time required to develop a<br />

product to the point it can be<br />

sold to customers<br />

• Market window<br />

– Period during which the<br />

product would have highest<br />

sales<br />

• Average time-to-market<br />

constraint is about 8 months<br />

• Delays can be costly<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

11


Losses due to delayed market entry<br />

Peak revenue<br />

• Simplified revenue model<br />

– Product life = 2W, peak at W<br />

Revenues ($)<br />

Market rise<br />

On-time<br />

Peak revenue from<br />

delayed entry<br />

Market fall<br />

– Time of market entry defines a<br />

triangle, representing market<br />

penetration<br />

Delayed<br />

– Triangle area equals revenue<br />

• Loss<br />

D<br />

On-time Delayed<br />

entry entry<br />

W<br />

Time<br />

2W<br />

– The difference between the ontime<br />

and delayed triangle areas<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

12


Losses due to delayed market entry (cont.)<br />

• Area = 1/2 * base * height<br />

Peak revenue<br />

– On-time = 1/2 * 2W * W<br />

Revenues ($)<br />

Market rise<br />

On-time<br />

Peak revenue from<br />

delayed entry<br />

Market fall<br />

– Delayed = 1/2 * (W-D+W)*(W-D)<br />

• Percentage revenue loss =<br />

(D(3W-D)/2W 2 )*100%<br />

Delayed<br />

• Try some examples<br />

D<br />

On-time Delayed<br />

entry entry<br />

W<br />

Time<br />

2W<br />

– Lifetime 2W=52 wks, delay D=4 wks<br />

– (4*(3*26 –4)/2*26^2) = 22%<br />

– Lifetime 2W=52 wks, delay D=10 wks<br />

– (10*(3*26 –10)/2*26^2) = 50%<br />

– Delays are costly!<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

13


NRE and unit cost metrics<br />

• Compare technologies by costs -- best depends on quantity<br />

– Technology A: NRE=$2,000, unit=$100<br />

– Technology B: NRE=$30,000, unit=$30<br />

– Technology C: NRE=$100,000, unit=$2<br />

total cost (x1000)<br />

$200,000<br />

A<br />

B<br />

$160,000<br />

C<br />

$120,000<br />

$80,000<br />

$40,000<br />

per product cost<br />

$200<br />

A<br />

B<br />

$160<br />

C<br />

$120<br />

$80<br />

$40<br />

$0<br />

$0<br />

0 800 1600 2400<br />

0 800 1600 2400<br />

Number of units (volume)<br />

• But, must also consider time-to-market<br />

Number of units (volume)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

14


The performance design metric<br />

• Widely-used measure of system, widely-abused<br />

– Clock frequency, instructions per second – not good measures<br />

– Digital camera example – a user cares about how fast it processes images, not<br />

clock speed or instructions per second<br />

• Latency (response time)<br />

– Time between task start and end<br />

– e.g., Camera’s A and B process images in 0.25 seconds<br />

• Throughput<br />

– Tasks per second, e.g. Camera A processes 4 images per second<br />

– Throughput can be more than latency seems to imply due to concurrency, e.g.<br />

Camera B may process 8 images per second (by capturing a new image while<br />

previous image is being stored).<br />

• Speedup of B over S = B’s performance / A’s performance<br />

– Throughput speedup = 8/4 = 2<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

15


Three key embedded system technologies<br />

• Technology<br />

– A manner of accomplishing a task, especially using technical<br />

processes, methods, or knowledge<br />

• Three key technologies for embedded systems<br />

– Processor technology<br />

– IC technology<br />

– Design technology<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

16


Processor technology<br />

• The architecture of the computation engine used to implement a<br />

system’s desired functionality<br />

• Processor does not have to be programmable<br />

– “Processor” not equal to general-purpose processor<br />

Controller<br />

Datapath<br />

Controller<br />

Datapath<br />

Controller<br />

Datapath<br />

Control<br />

logic and<br />

State register<br />

IR<br />

PC<br />

Register<br />

file<br />

General<br />

ALU<br />

Control logic<br />

and State<br />

register<br />

IR<br />

PC<br />

Registers<br />

Custom<br />

ALU<br />

Control<br />

logic<br />

State<br />

register<br />

index<br />

total<br />

+<br />

Data<br />

memory<br />

Data<br />

memory<br />

Program<br />

memory<br />

Assembly code<br />

for:<br />

Data<br />

memory<br />

Program memory<br />

Assembly code<br />

for:<br />

total = 0<br />

for i =1 to …<br />

General-purpose (“software”)<br />

total = 0<br />

for i =1 to …<br />

Application-specific<br />

Single-purpose (“hardware”)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

17


Processor technology<br />

• Processors vary in their customization for the problem at hand<br />

Desired<br />

functionality<br />

total = 0<br />

for i = 1 to N loop<br />

total += M[i]<br />

end loop<br />

General-purpose<br />

processor<br />

Application-specific<br />

processor<br />

Single-purpose<br />

processor<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

18


General-purpose processors<br />

• Programmable device used in a variety of<br />

applications<br />

– Also known as “microprocessor”<br />

• Features<br />

– Program memory<br />

– General datapath with large register file and<br />

general ALU<br />

• User benefits<br />

– Low time-to-market and NRE costs<br />

– High flexibility<br />

• “Pentium” the most well-known, but<br />

there are hundreds of others<br />

Controller<br />

Control<br />

logic and<br />

State<br />

register<br />

IR<br />

PC<br />

Program<br />

memory<br />

Assembly code<br />

for:<br />

total = 0<br />

for i =1 to …<br />

Datapath<br />

Register<br />

file<br />

General<br />

ALU<br />

Data<br />

memory<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

19


Single-purpose processors<br />

• Digital circuit designed to execute exactly<br />

one program<br />

– a.k.a. coprocessor, accelerator or peripheral<br />

• Features<br />

– Contains only the components needed to<br />

execute a single program<br />

– No program memory<br />

• Benefits<br />

– Fast<br />

– Low power<br />

– Small size<br />

Controller<br />

Control<br />

logic<br />

State<br />

register<br />

Datapath<br />

index<br />

total<br />

+<br />

Data<br />

memory<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

20


Application-specific processors<br />

• Programmable processor optimized for a<br />

particular class of applications having<br />

common characteristics<br />

– Compromise between general-purpose and<br />

single-purpose processors<br />

• Features<br />

– Program memory<br />

– Optimized datapath<br />

– Special functional units<br />

• Benefits<br />

– Some flexibility, good performance, size and<br />

power<br />

Controller<br />

Control<br />

logic and<br />

State<br />

register<br />

IR<br />

PC<br />

Program<br />

memory<br />

Assembly code<br />

for:<br />

total = 0<br />

for i =1 to …<br />

Datapath<br />

Registers<br />

Custom<br />

ALU<br />

Data<br />

memory<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

21


IC technology<br />

• Three types of IC technologies<br />

– Full-custom/VLSI<br />

– Semi-custom ASIC (gate array and standard cell)<br />

– PLD (Programmable Logic Device)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

22


Full-custom/VLSI<br />

• All layers are optimized for an embedded system’s<br />

particular digital implementation<br />

– Placing transistors<br />

– Sizing transistors<br />

– Routing wires<br />

• Benefits<br />

– Excellent performance, small size, low power<br />

• Drawbacks<br />

– High NRE cost (e.g., $300k), long time-to-market<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

23


Semi-custom<br />

• Lower layers are fully or partially built<br />

– Designers are left with routing of wires and maybe placing<br />

some blocks<br />

• Benefits<br />

– Good performance, good size, less NRE cost than a fullcustom<br />

implementation (perhaps $10k to $100k)<br />

• Drawbacks<br />

– Still require weeks to months to develop<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

24


PLD (Programmable Logic Device)<br />

• All layers already exist<br />

– Designers can purchase an IC<br />

– Connections on the IC are either created or destroyed to<br />

implement desired functionality<br />

– Field-Programmable Gate Array (FPGA) very popular<br />

• Benefits<br />

– Low NRE costs, almost instant IC availability<br />

• Drawbacks<br />

– Bigger, expensive (perhaps $30 per unit), power hungry,<br />

slower<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

25


Moore’s law<br />

• The most important trend in embedded systems<br />

– Predicted in 1965 by Intel co-founder Gordon Moore<br />

IC transistor capacity has doubled roughly every 18 months<br />

for the past several decades<br />

10,000<br />

1,000<br />

Logic transistors<br />

per chip<br />

(in millions)<br />

Note:<br />

logarithmic scale<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

0.001<br />

1981<br />

1983<br />

1985<br />

1987<br />

1989<br />

1991<br />

1993<br />

1995<br />

1997<br />

1999<br />

2001<br />

2003<br />

2005<br />

2007<br />

2009<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

26


Graphical illustration of Moore’s law<br />

1981 1984 1987 1990 1993 1996 1999 2002<br />

10,000<br />

transistors<br />

150,000,000<br />

transistors<br />

Leading edge<br />

chip in 1981<br />

Leading edge<br />

chip in 2002<br />

• Something that doubles frequently grows more quickly<br />

than most people realize!<br />

– A 2002 chip can hold about 15,000 1981 chips inside itself<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

27


Design Technology<br />

• The manner in which we convert our concept of<br />

desired system functionality into an implementation<br />

Compilation/<br />

Synthesis<br />

Libraries/<br />

IP<br />

Test/<br />

Verification<br />

Compilation/Synthesis:<br />

Automates exploration and<br />

insertion of implementation<br />

details for lower level.<br />

System<br />

specification<br />

System<br />

synthesis<br />

Hw/Sw/<br />

OS<br />

Model simulat./<br />

checkers<br />

Libraries/IP: Incorporates predesigned<br />

implementation from<br />

lower abstraction level into<br />

higher level.<br />

Behavioral<br />

specification<br />

RT<br />

specification<br />

Behavior<br />

synthesis<br />

RT<br />

synthesis<br />

Cores<br />

RT<br />

components<br />

Hw-Sw<br />

cosimulators<br />

HDL simulators<br />

Test/Verification: Ensures correct<br />

functionality at each level, thus<br />

reducing costly iterations<br />

between levels.<br />

Logic<br />

specification<br />

Logic<br />

synthesis<br />

Gates/<br />

Cells<br />

Gate<br />

simulators<br />

To final implementation<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

28


The co-design ladder<br />

• In the past:<br />

– <strong>Hardware</strong> and software<br />

design technologies were<br />

very different<br />

– Recent maturation of<br />

synthesis enables a unified<br />

view of hardware and<br />

software<br />

• <strong>Hardware</strong>/software<br />

“codesign”<br />

Compilers<br />

(1960's,1970's)<br />

Assembly instructions<br />

Assemblers, linkers<br />

(1950's, 1960's)<br />

Machine instructions<br />

Microprocessor plus<br />

program bits: “software”<br />

Sequential program code (e.g., C, VHDL)<br />

Implementation<br />

Behavioral synthesis<br />

(1990's)<br />

Register transfers<br />

RT synthesis<br />

(1980's, 1990's)<br />

Logic equations / FSM's<br />

Logic synthesis<br />

(1970's, 1980's)<br />

Logic gates<br />

VLSI, ASIC, or PLD<br />

implementation: “hardware”<br />

The choice of hardware versus software for a particular function is simply a tradeoff among various<br />

design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no<br />

fundamental difference between what hardware or software can implement.<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

29


Independence of processor and IC<br />

technologies<br />

• Basic tradeoff<br />

– General vs. custom<br />

– With respect to processor technology or IC technology<br />

– The two technologies are independent<br />

General,<br />

providing improved:<br />

Generalpurpose<br />

processor<br />

ASIP<br />

Singlepurpose<br />

processor<br />

Customized,<br />

providing improved:<br />

Flexibility<br />

Maintainability<br />

NRE cost<br />

Time- to-prototype<br />

Time-to-market<br />

Cost (low volume)<br />

Power efficiency<br />

Performance<br />

Size<br />

Cost (high volume)<br />

PLD<br />

Semi-custom<br />

Full-custom<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

30


Design productivity gap<br />

• 1981 leading edge chip required 100 designer months<br />

– 10,000 transistors / 100 transistors/month<br />

• 2002 leading edge chip requires 30,000 designer months<br />

– 150,000,000 / 5000 transistors/month<br />

• Designer cost increase from $1M to $300M<br />

10,000<br />

100,000<br />

1,000<br />

10,000<br />

Logic transistors<br />

per chip<br />

(in millions)<br />

100<br />

10<br />

1<br />

0.1<br />

0.01<br />

IC capacity<br />

productivity<br />

Gap<br />

1000<br />

100<br />

10<br />

1<br />

0.1<br />

Productivity<br />

(K) Trans./Staff-Mo.<br />

0.001<br />

0.01<br />

1981<br />

1983<br />

1985<br />

1987<br />

1989<br />

1991<br />

1993<br />

1995<br />

1997<br />

1999<br />

2001<br />

2003<br />

2005<br />

2007<br />

2009<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

31


The mythical man-month<br />

• The situation is even worse than the productivity gap indicates<br />

• In theory, adding designers to team reduces project completion time<br />

• In reality, productivity per designer decreases due to complexities of team management<br />

and communication<br />

• In the software community, known as “the mythical man-month” (Brooks 1975)<br />

• At some point, can actually lengthen project completion time! (“Too many cooks”)<br />

• 1M transistors, 1<br />

designer=5000 trans/month<br />

• Each additional designer<br />

reduces for 100 trans/month<br />

• So 2 designers produce 4900<br />

trans/month each<br />

60000<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

43<br />

Team<br />

16<br />

15<br />

16<br />

19<br />

18<br />

24<br />

23<br />

Months until completion<br />

Individual<br />

0<br />

10 20 30 40<br />

Number of designers<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

32


Summary<br />

• Embedded systems are everywhere<br />

• Key challenge: optimization of design metrics<br />

– Design metrics compete with one another<br />

• A unified view of hardware and software is necessary to<br />

improve productivity<br />

• Three key technologies<br />

– Processor: general-purpose, application-specific, single-purpose<br />

– IC: Full-custom, semi-custom, PLD<br />

– Design: Compilation/synthesis, libraries/IP, test/verification<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

33


Ch2. SPP: <strong>Hardware</strong> Summary<br />

• Ch2: Custom Single-Purpose Processors (<strong>Hardware</strong>)<br />

– Straightforward design techniques<br />

– Can be built to execute algorithms<br />

– Typically start with FSMD<br />

– CAD tools can be of great assistance<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

34


Ch3. GPP: <strong>Software</strong> Summary<br />

• General-purpose processors (GPPs)<br />

– Good performance, low NRE, flexible<br />

• Controller, datapath, and memory<br />

• Structured languages prevail<br />

– But some assembly level programming still necessary<br />

• Many tools available<br />

– Including instruction-set simulators, and in-circuit emulators<br />

• ASIPs (Application-Specific Instruction-Set Processors)<br />

– Microcontrollers, DSPs, network processors, more customized ASIPs<br />

• Choosing among processors is an important step<br />

• Designing a general-purpose processor is conceptually the same<br />

as designing a single-purpose processor<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

35


Ch4, 5, 6. Summary<br />

• Ch4. Standard Single Purpose Processors: Peripherals<br />

– Timer, Counter, UART, LCD Controller, Stepper Motor Controller,<br />

ADC, RT Clock, etc.<br />

• Ch5. Memory<br />

– ROM: Mask-Programmable MP-ROM, One-Time Programmable OTP-<br />

ROM, EPROM, EEPROM, etc.<br />

– RAM: SRAM, DRAM, Pseudo-Static PSRAM, Nonvolatile NVRAM,<br />

Fast-Page Mode FPM-DRAM, Extended Data Out EDO-DRAM,<br />

Rambus RDRAM<br />

– Memory Hierarchy and Cache, MMU<br />

• Ch6. Interfacing<br />

– Memory-mapped I/O, Standard I/O, Interrupts, DMA, Bus Arbritation,<br />

I 2 C, FireWire, USB, PCI Bus, ARM Bus, IrDA, Bluetooth, 802.11x.<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

36


Ch7. Digital Camera Example Summary<br />

• Digital camera example<br />

– Specifications in English and executable language<br />

– Design metrics: performance, power and area<br />

• Several implementations<br />

– Microcontroller: too slow<br />

– Microcontroller and coprocessor: better, but still too slow<br />

– Fixed-point arithmetic: almost fast enough<br />

– Additional coprocessor for compression: fast enough, but<br />

expensive and hard to design<br />

– Tradeoffs between hw/sw – the main lesson of this book!<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

37


Ch8. State machine and Concurrent Process<br />

Models Summary<br />

• Computation models are distinct from languages<br />

• Sequential program model is popular<br />

– Most common languages like C support it directly<br />

• State machine models good for control<br />

– Extensions like HCFSM provide additional power<br />

– PSM combines state machines and sequential programs<br />

• Concurrent process model for multi-task systems<br />

– Communication and synchronization methods exist<br />

– Scheduling is critical<br />

• Dataflow model good for signal processing<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

38


Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong><br />

(Ch9. Control Systems)<br />

(Ch10. IC Technology)<br />

<strong>Chapter</strong> 11: Design Technology<br />

39


Ch11. Design Technology<br />

• Automation: synthesis<br />

• Verification: hardware/software co-simulation<br />

• Reuse: intellectual property cores<br />

• Design process models<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

40


<strong>Introduction</strong><br />

• Design task<br />

– Define system functionality<br />

– Convert functionality to physical implementation while<br />

• Satisfying constrained metrics<br />

• Optimizing other design metrics<br />

• Designing embedded systems is hard<br />

– Complex functionality<br />

• Millions of possible environment scenarios<br />

• Competing, tightly constrained metrics<br />

– Productivity gap<br />

• As low as 10 lines of code or 100 transistors produced per day<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

41


Improving productivity<br />

• Design technologies developed to improve productivity<br />

• We focus on technologies advancing hardware/software unified<br />

view<br />

– Automation<br />

– Reuse<br />

• Program replaces manual design<br />

• Synthesis<br />

• Predesigned components<br />

• Cores<br />

• General-purpose and single-purpose processors on single IC<br />

– Verification<br />

Verification<br />

Automation<br />

• Ensuring correctness/completeness of each design step<br />

• <strong>Hardware</strong>/software co-simulation<br />

Specification<br />

Implementation<br />

Reuse<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

42


Automation: synthesis<br />

• Early design mostly hardware<br />

• <strong>Software</strong> complexity increased with advent<br />

of general-purpose processor<br />

• Different techniques for software design<br />

and hardware design<br />

– Caused division of the two fields<br />

• Design tools evolve for higher levels of<br />

abstraction<br />

– Different rate in each field<br />

• <strong>Hardware</strong>/software design fields rejoining<br />

– Both can start from behavioral description in<br />

sequential program model<br />

– 30 years longer for hardware design to reach<br />

this step in the ladder<br />

• Many more design dimensions<br />

• Optimization critical<br />

Compilers<br />

(1960s,1970s)<br />

Assembly instructions<br />

Assemblers, linkers<br />

(1950s, 1960s)<br />

Machine instructions<br />

Microprocessor plus<br />

program bits<br />

The codesign ladder<br />

Sequential program code (e.g., C, VHDL)<br />

Implementation<br />

Behavioral synthesis<br />

(1990s)<br />

Register transfers<br />

RT synthesis<br />

(1980s, 1990s)<br />

Logic equations / FSM's<br />

Logic synthesis<br />

(1970s, 1980s)<br />

Logic gates<br />

VLSI, ASIC, or PLD<br />

implementation<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

43


<strong>Hardware</strong>/software parallel evolution<br />

• <strong>Software</strong> design evolution<br />

– Machine instructions<br />

– Assemblers<br />

• convert assembly programs into machine<br />

instructions<br />

– Compilers<br />

• translate sequential programs into assembly<br />

• <strong>Hardware</strong> design evolution<br />

– Interconnected logic gates<br />

– Logic synthesis<br />

• converts logic equations or FSMs into gates<br />

– Register-transfer (RT) synthesis<br />

• converts FSMDs into FSMs, logic equations,<br />

predesigned RT components (registers,<br />

adders, etc.)<br />

– Behavioral synthesis<br />

• converts sequential programs into FSMDs<br />

Compilers<br />

(1960s,1970s)<br />

Assembly instructions<br />

Assemblers, linkers<br />

(1950s, 1960s)<br />

Machine instructions<br />

Microprocessor plus<br />

program bits<br />

The codesign ladder<br />

Sequential program code (e.g., C, VHDL)<br />

Implementation<br />

Behavioral synthesis<br />

(1990s)<br />

Register transfers<br />

RT synthesis<br />

(1980s, 1990s)<br />

Logic equations / FSM's<br />

Logic synthesis<br />

(1970s, 1980s)<br />

Logic gates<br />

VLSI, ASIC, or PLD<br />

implementation<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

44


Increasing abstraction level<br />

• Higher abstraction level focus of hardware/software design evolution<br />

– Description smaller/easier to capture<br />

• E.g., Line of sequential program code can translate to 1000 gates<br />

– Many more possible implementations available<br />

• (a) Like flashlight, the higher above the ground, the more ground illuminated<br />

– Sequential program designs may differ in performance/transistor count by orders of magnitude<br />

– Logic-level designs may differ by only power of 2<br />

• (b) Design process proceeds to lower abstraction level, narrowing in on single<br />

implementation<br />

modeling cost increases<br />

opportunities decrease<br />

idea<br />

implementation<br />

(a)<br />

back-of-the-envelope<br />

sequential program<br />

register-transfers<br />

logic<br />

idea<br />

implementation<br />

(b)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

45


Synthesis<br />

• Automatically converting system’s behavioral description to a structural<br />

implementation<br />

– Complex whole formed by parts<br />

– Structural implementation must optimize design metrics<br />

• More expensive, complex than compilers<br />

– Cost = $100s to $10,000s<br />

– User controls 100s of synthesis options<br />

– Optimization critical<br />

• Otherwise could use software<br />

– Optimizations different for each user<br />

– Run time = hours, days<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

46


Gajski’s Y-chart<br />

• Each axis represents type of description<br />

– Behavioral<br />

• Defines outputs as function of inputs<br />

• Algorithms but no implementation<br />

– Structural<br />

• Implements behavior by connecting<br />

components with known behavior<br />

– Physical<br />

• Gives size/locations of components and<br />

wires on chip/board<br />

• Synthesis converts behavior at given level<br />

to structure at same level or lower<br />

– E.g.,<br />

• FSM → gates, flip-flops (same level)<br />

• FSM → transistors (lower level)<br />

• FSM X registers, FUs (higher level)<br />

• FSM X processors, memories (higher level)<br />

Processors, memories<br />

Structural<br />

Registers, FUs, MUXs<br />

Gates, flip-flops<br />

Transistors<br />

Behavior<br />

Sequential programs<br />

Register transfers<br />

Logic equations/FSM<br />

Transfer functions<br />

Cell Layout<br />

Modules<br />

Chips<br />

Boards<br />

Physical<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

47


Logic synthesis<br />

• Logic-level behavior to structural implementation<br />

– Logic equations and/or FSM to connected gates<br />

• Combinational logic synthesis<br />

– Two-level minimization (Sum of products/product of sums)<br />

• Best possible performance<br />

– Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)<br />

• Minimize size<br />

– Minimum cover<br />

– Minimum cover that is prime<br />

– Heuristics<br />

– Multilevel minimization<br />

• Trade performance for size<br />

• Pareto-optimal solution<br />

– Heuristics<br />

• FSM synthesis<br />

– State minimization<br />

– State encoding<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

48


Two-level minimization<br />

• Represent logic function as sum of<br />

products (or product of sums)<br />

– AND gate for each product<br />

– OR gate for each sum<br />

• Gives best possible performance<br />

– At most 2 gate delay<br />

• Goal: minimize size<br />

– Minimum cover<br />

• Minimum # of AND gates (sum of products)<br />

– Minimum cover that is prime<br />

• Minimum # of inputs to each AND gate (sum<br />

of products)<br />

Sum of products<br />

F = abc'd' + a'b'cd + a'bcd + ab'cd<br />

Direct implementation<br />

a<br />

b<br />

c<br />

d<br />

4 4-input AND gates and<br />

1 4-input OR gate<br />

→ 40 transistors<br />

F<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

49


Minimum cover<br />

• Minimum # of AND gates (sum of products)<br />

• Literal: variable or its complement<br />

– a or a’, b or b’, etc.<br />

• Minterm: product of literals<br />

– Each literal appears exactly once<br />

• abc’d’, ab’cd, a’bcd, etc.<br />

• Implicant: product of literals<br />

– Each literal appears no more than once<br />

• abc’d’, a’cd, etc.<br />

– Covers 1 or more minterms<br />

• a’cd covers a’bcd and a’b’cd<br />

• Cover: set of implicants that covers all minterms of function<br />

• Minimum cover: cover with minimum # of implicants<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

50


Minimum cover: K-map approach<br />

• Karnaugh map (K-map)<br />

– 1 represents minterm<br />

– Circle represents implicant<br />

• Minimum cover<br />

– Covering all 1’s with min # of<br />

circles<br />

– Example: direct vs. min cover<br />

• Less gates<br />

– 4 vs. 5<br />

• Less transistors<br />

– 28 vs. 40<br />

K-map: sum of products<br />

ab cd<br />

00 01 11 10<br />

a<br />

b<br />

c<br />

d<br />

00<br />

01<br />

11<br />

10<br />

0 0 1 0<br />

0 0 1 0<br />

1 0 0 0<br />

0 0 1 0<br />

Minimum cover<br />

K-map: minimum cover<br />

ab cd<br />

00 01 11 10<br />

00<br />

01<br />

11<br />

10<br />

F=abc'd' + a'cd + ab'cd<br />

Minimum cover implementation<br />

F<br />

0 0 1 0<br />

0 0 1 0<br />

1 0 0 0<br />

0 0 1 0<br />

2 4-input AND gate<br />

1 3-input AND gates<br />

1 4 input OR gate<br />

→ 28 transistors<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

51


Minimum cover that is prime<br />

• Minimum # of inputs to AND gates<br />

• Prime implicant<br />

– Implicant not covered by any other<br />

implicant<br />

– Max-sized circle in K-map<br />

• Minimum cover that is prime<br />

– Covering with min # of prime implicants<br />

– Min # of max-sized circles<br />

– Example: prime cover vs. min cover<br />

• Same # of gates<br />

– 4 vs. 4<br />

• Less transistors<br />

– 26 vs. 28<br />

a<br />

b<br />

c<br />

d<br />

K-map: minimum cover that is prime<br />

ab cd<br />

00 01 11 10<br />

00<br />

01<br />

11<br />

10<br />

0 0 1 0<br />

0 0 1 0<br />

1 0 0 0<br />

0 0 1 0<br />

Minimum cover that is prime<br />

F=abc'd' + a'cd + b'cd<br />

Implementation<br />

F<br />

1 4-input AND gate<br />

2 3-input AND gates<br />

1 4 input OR gate<br />

→ 26 transistors<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

52


Minimum cover: heuristics<br />

• K-maps give optimal solution every time<br />

– Functions with > 6 inputs too complicated<br />

– Use computer-based tabular method<br />

• Heuristic<br />

• Finds all prime implicants<br />

• Finds min cover that is prime<br />

• Also optimal solution every time<br />

• Problem: 2 n minterms for n inputs<br />

– 32 inputs = 4 billion minterms<br />

– Exponential complexity<br />

– Solution technique where optimal solution not guaranteed<br />

– Hopefully comes close<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

53


Multilevel logic minimization<br />

• Trade performance for size<br />

– Increase delay for lower # of gates<br />

– Gray area represents all possible<br />

solutions<br />

– Circle with X represents ideal solution<br />

• Generally not possible<br />

– 2-level gives best performance<br />

• max delay = 2 gates<br />

• Solve for smallest size<br />

– Multilevel gives pareto-optimal solution<br />

• Minimum delay for a given size<br />

• Minimum size for a given delay<br />

delay<br />

multi-level minim.<br />

2-level minim.<br />

size<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

54


Example<br />

• Minimized 2-level logic function:<br />

– F = adef + bdef + cdef + gh<br />

– Requires 5 gates with 18 total gate inputs<br />

• 4 ANDS and 1 OR<br />

• After algebraic manipulation:<br />

– F = (a + b + c)def + gh<br />

– Requires only 4 gates with 11 total gate inputs<br />

• 2 ANDS and 2 ORs<br />

– Less inputs per gate<br />

– Assume gate inputs = 2 transistors<br />

• Reduced by 14 transistors<br />

– 36 (18 * 2) down to 22 (11 * 2)<br />

– Sacrifices performance for size<br />

• Inputs a, b, and c now have 3-gate delay<br />

• Iterative improvement heuristic commonly used<br />

a<br />

d<br />

b<br />

e<br />

c<br />

f<br />

g<br />

h<br />

2-level minimized<br />

multilevel minimized<br />

a<br />

b<br />

c<br />

d<br />

e<br />

f<br />

g<br />

F<br />

h<br />

F<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

55


FSM synthesis<br />

• FSM to gates<br />

• State minimization<br />

– Reduce # of states<br />

• Identify and merge equivalent states<br />

– Outputs, next states same for all possible inputs<br />

– Tabular method gives exact solution<br />

• Table of all possible state pairs<br />

• If n states, n 2 table entries<br />

• Thus, heuristics used with large # of states<br />

• State encoding<br />

– Unique bit sequence for each state<br />

– If n states, log 2 (n) bits<br />

– n! possible encodings<br />

– Thus, heuristics common<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

56


Technology mapping<br />

• Library of gates available for implementation<br />

– Simple<br />

• only 2-input AND,OR gates<br />

– Complex<br />

• various-input AND, OR, NAND, NOR, etc. gates<br />

• Efficiently implemented meta-gates (i.e., AND-OR-INVERT, MUX)<br />

• Final structure consists of specified library’s components only<br />

• If technology mapping integrated with logic synthesis<br />

– More efficient circuit<br />

– More complex problem<br />

– Heuristics required<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

57


Complexity impact on user<br />

• As complexity grows, heuristics used<br />

• Heuristics differ tremendously among synthesis tools<br />

– Computationally expensive<br />

• Higher quality results<br />

• Variable optimization effort settings<br />

• Long run times (hours, days)<br />

• Requires huge amounts of memory<br />

• Typically needs to run on servers, workstations<br />

– Fast heuristics<br />

• Lower quality results<br />

• Shorter run times (minutes, hours)<br />

• Smaller amount of memory required<br />

• Could run on PC<br />

• Super-linear-time (i.e. n 3 ) heuristics usually used<br />

– User can partition large systems to reduce run times/size<br />

– 100 3 > 50 3 + 50 3 (1,000,000 > 250,000)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

58


Integrating logic design and physical design<br />

• Past<br />

– Gate delay much greater than wire delay<br />

– Thus, performance evaluated as # of levels<br />

of gates only<br />

• Today<br />

– Gate delay shrinking as feature size<br />

shrinking<br />

– Wire delay increasing<br />

• Performance evaluation needs wire length<br />

– Transistor placement (needed for wire<br />

length) domain of physical design<br />

– Thus, simultaneous logic synthesis and<br />

physical design required for efficient<br />

circuits<br />

Delay<br />

Reduced feature size<br />

Wire<br />

Transistor<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

59


Register-transfer synthesis<br />

• Converts FSMD to custom single-purpose processor<br />

– Datapath<br />

• Register units to store variables<br />

– Complex data types<br />

• Functional units<br />

– Arithmetic operations<br />

• Connection units<br />

– Buses, MUXs<br />

– FSM controller<br />

• Controls datapath<br />

– Key sub problems:<br />

• Allocation<br />

– Instantiate storage, functional, connection units<br />

• Binding<br />

– Mapping FSMD operations to specific units<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

60


Behavioral synthesis<br />

• High-level synthesis<br />

• Converts single sequential program to single-purpose processor<br />

– Does not require the program to schedule states<br />

• Key sub problems<br />

– Allocation<br />

– Binding<br />

– Scheduling<br />

• Assign sequential program’s operations to states<br />

• Conversion template given in Ch. 2<br />

• Optimizations important<br />

– Compiler<br />

• Constant propagation, dead-code elimination, loop unrolling<br />

– Advanced techniques for allocation, binding, scheduling<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

61


System synthesis<br />

• Convert 1 or more processes into 1 or more processors (system)<br />

– For complex embedded systems<br />

• Multiple processes may provide better performance/power<br />

• May be better described using concurrent sequential programs<br />

• Tasks<br />

– Transformation<br />

• Can merge 2 exclusive processes into 1 process<br />

• Can break 1 large process into separate processes<br />

• Procedure inlining<br />

• Loop unrolling<br />

– Allocation<br />

• Essentially design of system architecture<br />

– Select processors to implement processes<br />

– Also select memories and busses<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

62


System synthesis<br />

• Tasks (cont.)<br />

– Partitioning<br />

• Mapping 1 or more processes to 1 or more processors<br />

• Variables among memories<br />

• Communications among buses<br />

– Scheduling<br />

• Multiple processes on a single processor<br />

• Memory accesses<br />

• Bus communications<br />

– Tasks performed in variety of orders<br />

– Iteration among tasks common<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

63


System synthesis<br />

• Synthesis driven by constraints<br />

– E.g.,<br />

• Meet performance requirements at minimum cost<br />

– Allocate as much behavior as possible to general-purpose processor<br />

• Low-cost/flexible implementation<br />

– Minimum # of SPPs used to meet performance<br />

• System synthesis for GPP only (software)<br />

– Common for decades<br />

• Multiprocessing<br />

• Parallel processing<br />

• Real-time scheduling<br />

• <strong>Hardware</strong>/software codesign<br />

– Simultaneous consideration of GPPs/SPPs during synthesis<br />

– Made possible by maturation of behavioral synthesis in 1990’s<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

64


Temporal vs. spatial thinking<br />

• Design thought process changed by evolution of synthesis<br />

• Before synthesis<br />

– Designers worked primarily in structural domain<br />

• Connecting simpler components to build more complex systems<br />

– Connecting logic gates to build controller<br />

– Connecting registers, MUXs, ALUs to build datapath<br />

– “capture and simulate” era<br />

• Capture using CAD tools<br />

• Simulate to verify correctness before fabricating<br />

– Spatial thinking<br />

• Structural diagrams<br />

• Data sheets<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

65


Temporal vs. spatial thinking<br />

• After synthesis<br />

– “describe-and-synthesize” era<br />

– Designers work primarily in behavioral domain<br />

– “describe and synthesize” era<br />

• Describe FSMDs or sequential programs<br />

• Synthesize into structure<br />

– Temporal thinking<br />

• States or sequential statements have relationship over time<br />

• Strong understanding of hardware structure still important<br />

– Behavioral description must synthesize to efficient structural<br />

implementation<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

66


Verification<br />

• Ensuring design is correct and complete<br />

– Correct<br />

• Implements specification accurately<br />

– Complete<br />

• Describes appropriate output to all relevant input<br />

• Formal verification<br />

– Hard<br />

– For small designs or verifying certain key properties only<br />

• Simulation<br />

– Most common verification method<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

67


Formal verification<br />

• Analyze design to prove or disprove certain properties<br />

• Correctness example<br />

– Prove ALU structural implementation equivalent to behavioral<br />

description<br />

• Derive Boolean equations for outputs<br />

• Create truth table for equations<br />

• Compare to truth table from original behavior<br />

• Completeness example<br />

– Formally prove elevator door can never open while elevator is moving<br />

• Derive conditions for door being open<br />

• Show conditions conflict with conditions for elevator moving<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

68


Simulation<br />

• Create computer model of design<br />

– Provide sample input<br />

– Check for acceptable output<br />

• Correctness example<br />

– ALU<br />

• Provide all possible input combinations<br />

• Check outputs for correct results<br />

• Completeness example<br />

– Elevator door closed when moving<br />

• Provide all possible input sequences<br />

• Check door always closed when elevator moving<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

69


Increases confidence<br />

• Simulating all possible input sequences impossible for most<br />

systems<br />

– E.g., 32-bit ALU<br />

• 2 32 * 2 32 = 2 64 possible input combinations<br />

• At 1 million combinations/sec<br />

• ½ million years to simulate<br />

• Sequential circuits even worse<br />

• Can only simulate tiny subset of possible inputs<br />

– Typical values<br />

– Known boundary conditions<br />

• E.g., 32-bit ALU<br />

– Both operands all 0’s<br />

– Both operands all 1’s<br />

• Increases confidence of correctness/completeness<br />

• Does not prove<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

70


Advantages over physical implementation<br />

• Controllability<br />

– Control time<br />

• Stop/start simulation at any time<br />

– Control data values<br />

• Inputs or internal values<br />

• Observability<br />

– Examine system/environment values at any time<br />

• Debugging<br />

– Can stop simulation at any point and:<br />

• Observe internal values<br />

• Modify system/environment values before restarting<br />

– Can step through small intervals (i.e., 500 nanoseconds)<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

71


Disadvantages<br />

• Simulation setup time<br />

– Often has complex external environments<br />

– Could spend more time modeling environment than system<br />

• Models likely incomplete<br />

– Some environment behavior undocumented if complex environment<br />

– May not model behavior correctly<br />

• Simulation speed much slower than actual execution<br />

– Sequentializing parallel design<br />

• IC: gates operate in parallel<br />

• Simulation: analyze inputs, generate outputs for each gate 1 at time<br />

– Several programs added between simulated system and real hardware<br />

• 1 simulated operation:<br />

– = 10 to 100 simulator operations<br />

– = 100 to 10,000 operating system operations<br />

– = 1,000 to 100,000 hardware operations<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

72


Simulation speed<br />

• Relative speeds of different types of<br />

simulation/emulation<br />

– 1 hour actual execution of SOC<br />

• = 1.2 years instruction-set simulation<br />

• = 10,000,000 hours gate-level simulation<br />

1 IC 1 hour<br />

×10 FPGA 1 day<br />

×100<br />

×1000<br />

hardware emulation<br />

throughput model<br />

4 days<br />

1.4 months<br />

×10000<br />

×100,000<br />

instruction-set simulation<br />

cycle-accurate simulation<br />

1.2 years<br />

12 years<br />

×1,000,000<br />

×10,000,000<br />

register-transfer-level HDL simulation<br />

gate-level HDL simulation<br />

>1 lifetime<br />

1<br />

millennium<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

73


Overcoming long simulation time<br />

• Reduce amount of real time simulated<br />

– 1 msec execution instead of 1 hour<br />

• 0.001sec * 10,000,000 = 10,000 sec = 3 hours<br />

– Reduced confidence<br />

• 1 msec of cruise controller operation tells us little<br />

• Faster simulator<br />

– Emulators<br />

• Special hardware for simulations<br />

– Less precise/accurate simulators<br />

• Exchange speed for observability/controllability<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

74


Reducing precision/accuracy<br />

• Don’t need gate-level analysis for all simulations<br />

– E.g., cruise control<br />

• Don’t care what happens at every input/output of each logic gate<br />

– Simulating RT components ~10x faster<br />

– Cycle-based simulation ~100x faster<br />

• Accurate at clock boundaries only<br />

• No information on signal changes between boundaries<br />

• Faster simulator often combined with reduction in real time<br />

– If willing to simulate for 10 hours<br />

• Use instruction-set simulator<br />

• Real execution time simulated<br />

– 10 hours * 1 / 10,000<br />

– = 0.001 hour<br />

– = 3.6 seconds<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

75


<strong>Hardware</strong>/software co-simulation<br />

• Variety of simulation approaches exist<br />

– From very detailed<br />

• E.g., gate-level model<br />

– To very abstract<br />

• E.g., instruction-level model<br />

• Simulation tools evolved separately for hardware/software<br />

– Recall separate design evolution<br />

– <strong>Software</strong> (GPP)<br />

• Typically with instruction-set simulator (ISS)<br />

– <strong>Hardware</strong> (SPP)<br />

• Typically with models in HDL environment<br />

• Integration of GPP/SPP on single IC creating need for merging<br />

simulation tools<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

76


Integrating GPP/SPP simulations<br />

• Simple/naïve way<br />

– HDL model of microprocessor<br />

• Runs system software<br />

• Much slower than ISS<br />

• Less observable/controllable than ISS<br />

– HDL models of SPPs<br />

– Integrate all models<br />

• <strong>Hardware</strong>-software co-simulator<br />

– ISS for microprocessor<br />

– HDL model for SPPs<br />

– Create communication between simulators<br />

– Simulators run separately except when transferring data<br />

– Faster<br />

– Though, frequent communication between ISS and HDL model slows it down<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

77


Minimizing communication<br />

• Memory shared between GPP and SPPs<br />

– Where should memory go?<br />

– In ISS<br />

• HDL simulator must stall for memory access<br />

– In HDL?<br />

• ISS must stall when fetching each instruction<br />

• Model memory in both ISS and HDL<br />

– Most accesses by each model unrelated to other’s accesses<br />

• No need to communicate these between models<br />

– Co-simulator ensures consistency of shared data<br />

– Huge speedups (100x or more) reported with this technique<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

78


Emulators<br />

• General physical device system mapped to<br />

– Microprocessor emulator<br />

• Microprocessor IC with some monitoring, control circuitry<br />

– SPP emulator<br />

• FPGAs (10s to 100s)<br />

– Usually supports debugging tasks<br />

• Created to help solve simulation disadvantages<br />

– Mapped relatively quickly<br />

• Hours, days<br />

– Can be placed in real environment<br />

• No environment setup time<br />

• No incomplete environment<br />

– Typically faster than simulation<br />

• <strong>Hardware</strong> implementation<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

79


Disadvantages<br />

• Still not as fast as real implementations<br />

– E.g., emulated cruise-control may not respond fast enough to<br />

keep control of car<br />

• Mapping still time consuming<br />

– E.g., mapping complex SOC to 10 FPGAs<br />

• Just partitioning into 10 parts could take weeks<br />

• Can be very expensive<br />

– Top-of-the-line FPGA-based emulator: $100,000 to $1mill<br />

– Leads to resource bottleneck<br />

• Can maybe only afford 1 emulator<br />

• Groups wait days, weeks for other group to finish using<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

80


Reuse: intellectual property cores<br />

• Commercial off-the-shelf (COTS) components<br />

– Predesigned, prepackaged ICs<br />

– Implements GPP or SPP<br />

– Reduces design/debug time<br />

– Have always been available<br />

• System-on-a-chip (SOC)<br />

– All components of system implemented on single chip<br />

– Made possible by increasing IC capacities<br />

– Changing the way COTS components sold<br />

• As intellectual property (IP) rather than actual IC<br />

– Behavioral, structural, or physical descriptions<br />

– Processor-level components known as cores<br />

• SOC built by integrating multiple descriptions<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

81


Cores<br />

• Soft core<br />

– Synthesizable behavioral<br />

description<br />

– Typically written in HDL<br />

(VHDL/Verilog)<br />

• Firm core<br />

– Structural description<br />

– Typically provided in HDL<br />

• Hard core<br />

– Physical description<br />

– Provided in variety of physical<br />

layout file formats<br />

Processors, memories<br />

Structural<br />

Registers, FUs, MUXs<br />

Gates, flip-flops<br />

Transistors<br />

Gajski’s Y-chart<br />

Physical<br />

Behavior<br />

Sequential programs<br />

Register transfers<br />

Logic equations/FSM<br />

Transfer functions<br />

Cell Layout<br />

Modules<br />

Chips<br />

Boards<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

82


Advantages/disadvantages of hard core<br />

• Ease of use<br />

– Developer already designed and tested core<br />

• Can use right away<br />

• Can expect to work correctly<br />

• Predictability<br />

– Size, power, performance predicted accurately<br />

• Not easily mapped (retargeted) to different process<br />

– E.g., core available for vendor X’s 0.25 micrometer CMOS<br />

process<br />

• Can’t use with vendor X’s 0.18 micrometer process<br />

• Can’t use with vendor Y<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

83


Advantages/disadvantages of soft/firm cores<br />

• Soft cores<br />

– Can be synthesized to nearly any technology<br />

– Can optimize for particular use<br />

• E.g., delete unused portion of core<br />

– Lower power, smaller designs<br />

– Requires more design effort<br />

– May not work in technology not tested for<br />

– Not as optimized as hard core for same processor<br />

• Firm cores<br />

– Compromise between hard and soft cores<br />

• Some retargetability<br />

• Limited optimization<br />

• Better predictability/ease of use<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

84


New challenges to processor providers<br />

• Cores have dramatically changed business model<br />

– Pricing models<br />

• Past<br />

– Vendors sold product as IC to designers<br />

– Designers must buy any additional copies<br />

• Could not (economically) copy from original<br />

• Today<br />

– Vendors can sell as IP<br />

– Designers can make as many copies as needed<br />

• Vendor can use different pricing models<br />

– Royalty-based model<br />

• Similar to old IC model<br />

• Designer pays for each additional model<br />

– Fixed price model<br />

• One price for IP and as many copies as needed<br />

– Many other models used<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

85


IP protection<br />

• Past<br />

– Illegally copying IC very difficult<br />

• Reverse engineering required tremendous, deliberate effort<br />

• “Accidental” copying not possible<br />

• Today<br />

– Cores sold in electronic format<br />

• Deliberate/accidental unauthorized copying easier<br />

• Safeguards greatly increased<br />

• Contracts to ensure no copying/distributing<br />

• Encryption techniques<br />

– limit actual exposure to IP<br />

• Watermarking<br />

– determines if particular instance of processor was copied<br />

– whether copy authorized<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

86


New challenges to processor users<br />

• Licensing arrangements<br />

– Not as easy as purchasing IC<br />

– More contracts enforcing pricing model and IP protection<br />

• Possibly requiring legal assistance<br />

• Extra design effort<br />

– Especially for soft cores<br />

• Must still be synthesized and tested<br />

• Minor differences in synthesis tools can cause problems<br />

• Verification requirements more difficult<br />

– Extensive testing for synthesized soft cores and soft/firm cores mapped to particular<br />

technology<br />

• Ensure correct synthesis<br />

• Timing and power vary between implementations<br />

– Early verification critical<br />

• Cores buried within IC<br />

• Cannot simply replace bad core<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

87


Design process model<br />

• Describes order that design steps are processed<br />

– Behavior description step<br />

– Behavior to structure conversion step<br />

– Mapping structure to physical implementation<br />

step<br />

• Waterfall model<br />

– Proceed to next step only after current step<br />

completed<br />

• Spiral model<br />

– Proceed through 3 steps in order but with less<br />

detail<br />

– Repeat 3 steps gradually increasing detail<br />

– Keep repeating until desired system obtained<br />

– Becoming extremely popular (hardware &<br />

software development)<br />

Waterfall design model<br />

Behavioral<br />

Structural<br />

Physical<br />

Spiral design model<br />

Structural<br />

Behavioral<br />

Physical<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

88


Waterfall method<br />

• Not very realistic<br />

– Bugs often found in later steps that must be fixed in<br />

earlier step<br />

• E.g., forgot to handle certain input condition<br />

– Prototype often needed to know complete desired<br />

behavior<br />

• E.g, customer adds features after product demo<br />

– System specifications commonly change<br />

• E.g., to remain competitive by reducing power, size<br />

– Certain features dropped<br />

• Unexpected iterations back through 3 steps<br />

cause missed deadlines<br />

– Lost revenues<br />

– May never make it to market<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

Waterfall design model<br />

Behavioral<br />

Structural<br />

Physical<br />

89


Spiral method<br />

• First iteration of 3 steps incomplete<br />

• Much faster, though<br />

– End up with prototype<br />

• Use to test basic functions<br />

• Get idea of functions to add/remove<br />

– Original iteration experience helps in following<br />

iterations of 3 steps<br />

• Must come up with ways to obtain structure and<br />

physical implementations quickly<br />

– E.g., FPGAs for prototype<br />

• silicon for final product<br />

– May have to use more tools<br />

• Extra effort/cost<br />

• Could require more time than waterfall method<br />

– If correct implementation first time with waterfall<br />

Spiral design model<br />

Structural<br />

Behavioral<br />

Physical<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

90


General-purpose processor design models<br />

• Previous slides focused on SPPs<br />

• Can apply equally to GPPs<br />

– Waterfall model<br />

• Structure developed by particular company<br />

• Acquired by embedded system designer<br />

• Designer develops software (behavior)<br />

• Designer maps application to architecture<br />

– Compilation<br />

– Manual design<br />

– Spiral-like model<br />

• Beginning to be applied by embedded system designers<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

91


Spiral-like model<br />

• Designer develops or acquires architecture<br />

• Develops application(s)<br />

• Maps application to architecture<br />

• Analyzes design metrics<br />

• Now makes choice<br />

– Modify mapping<br />

– Modify application(s) to better suit architecture<br />

– Modify architecture to better suit application(s)<br />

• Not as difficult now<br />

– Maturation of synthesis/compilers<br />

– IPs can be tuned<br />

• Continue refining to lower abstraction level until<br />

particular implementation chosen<br />

Y-chart<br />

Architecture Application(s)<br />

Mapping<br />

Analysis<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

92


Summary<br />

• Design technology seeks to reduce gap between IC<br />

capacity growth and designer productivity growth<br />

• Synthesis has changed digital design<br />

• Increased IC capacity means sw/hw components<br />

coexist on one chip<br />

• Design paradigm shift to core-based design<br />

• Simulation essential but hard<br />

• Spiral design process is popular<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

93


Book Summary<br />

• Embedded systems are common and growing<br />

– Such systems are very different from in the past due to increased IC capacities and<br />

automation tools<br />

– Indicator: National Science Foundation just created a separate program on Embedded<br />

Systems (2002).<br />

• New view:<br />

– Embedded computing systems are built from a collection of processors, some generalpurpose<br />

(sw), some single-purpose (hw)<br />

– Hw/sw differ in design metrics, not in some fundamental way<br />

– Memory and interfaces necessary to complete system<br />

– Days of embedded system design as assembly-level programming of one microprocessor<br />

are fading away<br />

• Need to focus on higher-level issues<br />

– State machines, concurrent processes, control systems<br />

– IC technologies, design technologies<br />

• There’s a growing, challenging and exciting world of embedded systems design out<br />

there. There’s also much more to learn. Enjoy!<br />

Embedded Systems Design: A Unified<br />

<strong>Hardware</strong>/<strong>Software</strong> <strong>Introduction</strong>, (c) 2000 Vahid/Givargis<br />

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!