Farm of Streaming Engines (FaStE)

es.ele.tue.nl

Farm of Streaming Engines (FaStE)

September 22, 2005

TU e

Farm of Streaming Engines (FaStE)

Meeting SCALP, Artemisia and PreMaDoNa


TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 2


Aim and context

TU e

• Aim: to develop a multiprocessor for car infotainment

where streaming and control are separated (a farm of

streaming engines)

• The multiprocessor should support multiple streams/

applications (HRT/SRT)

• This implies an application driven approach (vertical

slice) reusing the results of other methods driven

projects (horizontal)

PreMaDoNa

Hijdra

TTL

Aethereal

FaStE

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 3


Overview

TU e

• In-car Digital Entertainment applications

• Architecture characteristics

• Exploration on the integration of a Aethereal networkon-chip

(what is available today!)

• Proposed architecture for evaluation

• Future goals

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 4


TU e

Current In-car Digital Entertainment application

Digital host in

Analog Aux

Analog Aux

SPDIF in

IIS in

IIS in

IIS in

Tuner IF

Automatic Gain control

Tile

0

Control

Radio

Hard real-time requirements

SRC &

MP3

Tile

1

Control

Controller

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 5

Tile

2

Control

Audio post

processing

Tile

3

Control

Front stereo

Rear stereo

Digital host out

Digital host out

Digital host out

RDS


Applications next generations

Broadcast

– High Definition (HD) radio

– Satellite Digital Audio Radio Service (SDARS)

On demand audio service

– Ripping (e.g. encoding audio)

Audio quality (for car phone)

– Noise Reduction (NR)

– Acoustic Echo Cancellation (AEC)

Storage media

– CD/DVD

– Harddisk

– Removable discs (e.g. USB stick, flash card)

Connectivity

– Bluethooth

– USB

– WiFi

Navigation

Video

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 6


MEM

Current architecture characteristics

AHB if

Arbiter DSP

ITC

Predictable?

Accelerators

MEM

AHB if

Controller Block MEM

DMA SPI

CD

Dec.

% &

Arbiter DSP

ITC

DIO Switch

" #$ ! !

Bottleneck?

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 7

()$

*

MEM

ITC

AHB if

Multi-layer AHB bus

Arbiter DSP

ITC

MEM MEM MEM MEM MEM

ARM based subsystem

Peripherals

MEM

'

Write only /

Scalability? AHB if

VPB

Domain 2

#

Arbiter DSP

VPB

Domain 1

ITC

AHB2VPB AHB2VPB AHB2VPB

VPB

Domain 0


Expected resources for next generations

TU e

Viper

≈ 9M gates

Generation

i i+1 i+2 i+3

Technology 180 nm 90 nm 90 nm 65 nm

Pow er supply voltage core 1.8 V 1.5 V 1.5 V 1.2 V

Pow er supply voltage ana/IO 3.3 V 2.5 V 2.5 V 2.5 V

External connections 176 208 260 310

Number of gates 1,200,000 gates 2,700,000 gates 6,000,000 gates 13,000,000 gates

Number of flip flops 80,000 flip flops 180,000 flip flops 370,000 flip flops 800,000 flip flops

Frequency 130 MHz 195 MHz 300 MHz 430 MHz

Number of processors 5 8 12 17

Number of accellerators 3 6 9 12

Streaming processing pow er 545 MHz 1600 MHz 3000 MHz 6000 MHz

Generation

i i+1 i+2 i+3

DSP program memory 2,976 Kbit 8,200 Kbit 23,000 Kbit 63,000 Kbit

DSP data memory 948 Kbit 2,300 Kbit 6,700 Kbit 20,000 Kbit

DSP coefficient memory 420 Kbit 1,000 Kbit 2,800 Kbit 8,200 Kbit

Controller program memory 6,240 Kbit x x x

Controller data memory 1,088 Kbit x x x

Average memory per DSP tile 1,086 Kbit 1,643 Kbit 2,955 Kbit 5,365 Kbit

Memory content 55% 68% 77% 84%

DSP memory

(# processors–1)

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 8


Area of a network?

Digital host in

Analog Aux

Analog Aux

SPDIF in

IIS in

IIS in

IIS in

Tuner IF

Automatic Gain control

2x2@48

2@44.1

2@44.1

1x2@92

2@48

2@48

2@48

2@325

1@40.625

8x2@325

8x2@325

Tile

0

Control

9x4@325

9x2@325

18.5@40.625

9x4@325

9x2@325

Tile

1

Control

1xMulti Channel + 2xStereo + 1xMono

Controller

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 9

2@40.625

2@40.625

+7@48

1@9.17

Tile

2

Control

9@44.1

6@44.1

1@44.1

1@44.1

2@44.1

Tile

3

Control

33 streaming

channels

2@44.1

2@44.1

2x2@44.1

1x2@44.1

1x2@40.625

1@58

1@58

Front stereo

Rear stereo

Digital host out

Digital host out

Digital host out

RDS


Network connections

Tile 1 Tile 2

Cfg

Master

Shell

NI

Data

NI

P1 I 1

1 T

P2

2

2

3

3

P3

T 4

Router

4 I

P4

I

I

0

ctr ctr

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 10

0

T

Shell

Map each channels

to a dedicated

connection

Tile 1 1 1

Tile 2

Cfg

Master

T

NI Address & Data

NI

P1 P2

P3 2 2

P4

I

0

Router

ctr ctr

0

T

T

I

Shared address

space by sending

addresses


MEM

Proposed architecture for evaluation

DSP

Peripheral tile

Area

Area

Area

interfaces

routers

network

=

=

=

CA

NI

NI

Router

0.

734mm

0.

270mm

0.

734

+

MEM

2

DSP

CA

ARM based subsystem

2

0.

270

=

1.

004mm

NI

NI

2

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 11

MEM

DSP

CA

NI

Chip area increase

≈ 1.5%

NI

Router

MEM

DSP

Area numbers in CMOS12 before placement and routing

CA

NI

NI


Latency of a GT connection

Shell

(2*1/125MHz)

16ns

2

2

P1

4

CRD

Tile Router

Cordic

P2

2

I

T

NI

36

18

c

c

Slot table size = 2 slots

125MHz 500MHz 250MHz

Time wheel

2*3*1/500MHz

Clock

Domain

crossing

(2*1/500MHz)

4ns 12ns

16ns 16ns

Shell Clock

(2*1/125MHz) Domain

crossing

(2*1/125MHz)

Round trip latency ≤ 37 clock cycles

NI kernel

3*1/500MHz

6ns

0.5 1.5 0.75

2

1.5

0.75

6ns

NI kernel

3*1/500MHz

Router

3*1/500MHz

6ns

6ns

Router

3*1/500MHz

NI kernel

3*1/500MHz

6ns

6ns

NI kernel

3*1/500MHz

TU e

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 12

c

c

NI

36

18

T

Clock

Domain

crossing

(2*1/250MHz)

8ns

12ns

Time wheel

2*3*1/500MHz

I

Shell

(2*1/250MHz)

8ns

0.75 0.75 1 1

0.75

0.75

1.5

0.5

4

2

Round trip latency

Caracas ≤ 36 clock

cycles

4ns 8ns

Clock Shell

Domain (2*1/250MHz)

crossing

(2*1/500MHz)

1

18

Cordic

(36*1/250MHz)

144ns


Future goals

TU e

• To build the proposed architecture in a SystemC

environment using parts from Bolivar and Æthereal

• Have models of the application and architecture to

derive the temporal behavior of the system

• Mapping applications to do analysis on the architecture

and models

• Definition of next generation architectures for In-Car

Digital Entertainment that support applications like audio,

wireless streaming, connectivity, navigation, video, etc.

Meeting SCALP, Artemisia and PreMaDoNa, September 22, 2005. 13


TU e

More magazines by this user
Similar magazines