03.04.2013 Views

Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale

Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale

Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

July, 2009<br />

<strong>MAPLE</strong> <strong>Hardware</strong> <strong>Accelerator</strong> <strong>and</strong><br />

<strong>SC3850</strong> DSP Core<br />

Enabling flexible multi-st<strong>and</strong>ard base-station designs<br />

Ron Bercovich<br />

DSP IP Manager<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009.<br />

Itay Peled<br />

Project Leader, DSP Platform Architecture<br />

TM


<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 2<br />

Agenda<br />

►MSC8156 multicore digital signal processor (DSP) product using the<br />

<strong>SC3850</strong> DSP cores <strong>and</strong> <strong>MAPLE</strong>-B accelerators<br />

►<strong>MAPLE</strong>-B accelerators<br />

• Programmable accelerators concept<br />

• Programming model<br />

• Accelerated functions <strong>and</strong> st<strong>and</strong>ards compliance<br />

► <strong>SC3850</strong> DSP built on StarCore ® technology<br />

• Architecture overview<br />

• Performance<br />

• L1 <strong>and</strong> L2 cache sub-system<br />

►Summary<br />

TM


• Six <strong>SC3850</strong> Cores Subsystems (up to 6 GHz/48 GMACs)<br />

each with:<br />

• <strong>SC3850</strong> DSP core at up to 1 GHz (8 GMACs 16b or 8b)<br />

• 512 KB unified L2 cache / M2 memory<br />

• 32 KB I-cache, 32 KB D-cache, WBB, WTB, MMU, PIC<br />

• Internal/External Memories/Caches<br />

• 1056 KB M3 shared memory (SRAM)<br />

• Two DDR 2/3 64-bit SDRAM interfaces at up to 800 MHz<br />

• CLASS – Chip-Level Arbitration <strong>and</strong> Switching Fabric<br />

• Non-Blocking, fully pipelined, low latency<br />

• Full fabric 12 masters to eight slaves, up to 512 Gbps<br />

throughput<br />

• <strong>MAPLE</strong>-B – Baseb<strong>and</strong> <strong>Accelerator</strong><br />

• Turbo/Viterbi decoder up to 200/115 Mbps<br />

supporting: 3G-LTE, 802.16, 3G, CDMA2K st<strong>and</strong>ards<br />

• FFT <strong>and</strong> DFT accelerators up to 280 <strong>and</strong> 175 Msps<br />

• Multi-st<strong>and</strong>ard CRC check <strong>and</strong> insertion<br />

• Security Engine (Talitos 3.1)<br />

• Data <strong>and</strong> code protection (AES, SHA, Kasumi, SNOW3G)<br />

• High Speed Interconnects<br />

• Dual 4x/1x Serial RapidIO ® at 1.25/2.5/3.125 Gbaud<br />

• PCI Express ® 4x/1x<br />

• Dual RISC QUICCEngine Technology Supporting<br />

• Dual SGMII/RGMII Gigabit Ethernet ports<br />

• Eth. L1 Protocols, Talitos control <strong>and</strong> Serial RapidIO <strong>of</strong>fload<br />

• TDM Highway<br />

• 1024 ch., 400Mbps, divided into four ports <strong>of</strong> 256<br />

• DMA Engine 16 bi-directional channels w/ external req/ack<br />

• Eight <strong>Hardware</strong> Semaphores<br />

• Other Peripheral Interfaces<br />

• SPI, UART, I 2 C, 32 GPIO, 16 Timers, 96 KB boot ROM,<br />

JTAG/SAP, 8 WDT<br />

• Technology<br />

• 45 nm SOI, 1V core, 2.5, 1.8/1.5V I/O<br />

• FCBPGA (29x29) 1mm pitch, RoHS<br />

SerDes x4<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 3<br />

MSC8156E 45 nm Six-Core DSP<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

On-Chip Network<br />

2x SRIO 4x/1x,<br />

1x PCIe 4x/1x<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

6<br />

cores<br />

CLASS – Non-Blocking Switch Fabric<br />

SerDes x4<br />

2x Gigabit<br />

Ethernet, SPI<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

<strong>SC3850</strong> core<br />

32KB L1<br />

I-Cache<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

32KB L1<br />

D-Cache<br />

512KB Unified M2/L2<br />

TDM<br />

Highway<br />

4 ports<br />

Now Sampling<br />

DDR 2/3<br />

Memory<br />

Controller<br />

Security<br />

Processing<br />

Engine<br />

Shared<br />

Memory<br />

1056 KB<br />

DMA Engine<br />

DDR 2/3<br />

Memory<br />

Controller<br />

I²C, UART, GPIOs<br />

H/W Semaphores<br />

<strong>MAPLE</strong>-B<br />

Baseb<strong>and</strong><br />

<strong>Accelerator</strong><br />

TM


Multi-<strong>Accelerator</strong>-Platform for Baseb<strong>and</strong><br />

<strong>MAPLE</strong>-B<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 4<br />

TM


<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 5<br />

<strong>MAPLE</strong>-B <strong>Accelerator</strong> <strong>Overview</strong><br />

►S<strong>of</strong>tware friendly buffer descriptor based h<strong>and</strong>shake <strong>and</strong> task assignment<br />

with minimal overhead on DSP cores for control<br />

►Highly flexible <strong>and</strong> programmable Turbo <strong>and</strong> Viterbi decoder supporting<br />

various configurable decoding parameters<br />

• High throughput Turbo decoding for low latency <strong>and</strong> advanced antenna systems<br />

or<br />

• Low latency multi-st<strong>and</strong>ard Viterbi decoding for data/control channels<br />

• Multi-st<strong>and</strong>ard capable: UMTS, CDMA2K, WiMAX <strong>and</strong> Long Term Evolution<br />

(LTE)<br />

• Flexible rate de-matching schemes for multiple st<strong>and</strong>ards, accelerating HARQ<br />

functionality<br />

► Flexible <strong>and</strong> advanced FFT/DFT acceleration:<br />

• FFT/iFFT for sizes 128, 256, 512, 1024, 2048 points<br />

• DFT/iDFT for LTE sizes<br />

► High speed CRC calculation/check accelerator for:<br />

• LTE code <strong>and</strong> transport block in UL <strong>and</strong> DL<br />

• WiMAX PHY Burst CRC in UL <strong>and</strong> DL<br />

TM


BD’s write/read <strong>and</strong><br />

debug<br />

by DSP core/host<br />

64b<br />

450 MHz<br />

PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

Data write/read<br />

by <strong>MAPLE</strong><br />

SIF<br />

VRE<br />

2x 64b<br />

450 MHz<br />

System<br />

DMA Engine<br />

EXT MEM<br />

EXTL<br />

PSIF : Programmable System Interface<br />

TVPE : Turbo/Viterbi Processing Engine<br />

FFTPE : FFT Processing Engine<br />

DFTPE : DFT Processing Engine<br />

TVPE<br />

Local DMA/<br />

CRC PE x2<br />

CD , NII, HO MEM<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

FFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 8<br />

Cells<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 6<br />

CTL<br />

CDL<br />

IRAM<br />

16kB<br />

Arbitration <strong>and</strong> switching<br />

SIF<br />

NIIL<br />

HOL<br />

DRE0 DRE1 DRE2 DRE3<br />

<strong>MAPLE</strong>-B Block Diagram<br />

RISC 0<br />

Core<br />

SIF<br />

RISC 1<br />

Core<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

IRAM<br />

16kB<br />

SIF<br />

Interrupts<br />

Radix 2<br />

Cells<br />

Radix 3<br />

Cells<br />

PIC<br />

DFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 4<br />

Cells<br />

Radix 5<br />

Cells<br />

PSIF config<br />

CE<br />

Slave<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

TM


BD’s write/read <strong>and</strong><br />

debug<br />

by DSP core/host<br />

64b<br />

450 MHz<br />

PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

Data write/read<br />

by <strong>MAPLE</strong><br />

SIF<br />

VRE<br />

2x 64b<br />

450 MHz<br />

PSIF : Programmable System Interface<br />

TVPE : Turbo/Viterbi Processing Engine<br />

FFTPE : FFT Processing Engine<br />

DFTPE : DFT Processing Engine<br />

Programmable System Interface Based on<br />

RISC Engines :<br />

System Local DMA/ IRAM RISC 0<br />

DMA Engine CRC PE x2 16kB<br />

• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />

Core<br />

• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />

• DMA capabilities, <strong>and</strong> high system BW support<br />

Arbitration <strong>and</strong> switching<br />

• Low level task control <strong>and</strong> split<br />

• CRC acceleration<br />

EXT MEM<br />

EXTL<br />

TVPE<br />

CD , NII, HO MEM<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

FFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 8<br />

Cells<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 7<br />

CTL<br />

CDL<br />

SIF<br />

NIIL<br />

HOL<br />

DRE0 DRE1 DRE2 DRE3<br />

<strong>MAPLE</strong>-B Block Diagram<br />

SIF<br />

RISC 1<br />

Core<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

IRAM<br />

16kB<br />

SIF<br />

Interrupts<br />

Radix 2<br />

Cells<br />

Radix 3<br />

Cells<br />

PIC<br />

DFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 4<br />

Cells<br />

Radix 5<br />

Cells<br />

PSIF config<br />

CE<br />

Slave<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

TM


BD’s write/read <strong>and</strong><br />

debug<br />

by DSP core/host<br />

64b<br />

450 MHz<br />

PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

Data write/read<br />

by <strong>MAPLE</strong><br />

SIF<br />

2x 64b<br />

450 MHz<br />

PSIF : Programmable System Interface<br />

TVPE : Turbo/Viterbi Processing Engine<br />

FFTPE : FFT Processing Engine<br />

DFTPE : DFT Processing Engine<br />

Programmable System Interface Based on<br />

RISC Engines :<br />

System Local DMA/ IRAM RISC 0<br />

DMA Engine CRC PE x2 16kB<br />

• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />

Core<br />

• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />

• DMA capabilities, <strong>and</strong> high system BW support<br />

Arbitration <strong>and</strong> switching<br />

• Low level task control <strong>and</strong> split<br />

• CRC acceleration<br />

TVPE<br />

Turbo-Viterbi<br />

Processing Element :<br />

EXT MEM<br />

CD , NII, HO MEM<br />

• SIMD parallelism<br />

• Novel heuristics <strong>and</strong> Radix 4<br />

NIIL<br />

architecture<br />

EXTL<br />

CDL HOL<br />

• >10x the throughput <strong>of</strong> existent<br />

industry/competitor solutions<br />

VRE•<br />

Multi-st<strong>and</strong>ard DRE0 DRE1 capable: DRE2 DRE3<br />

• binary/duo,<br />

• tail-bit/trellis CTL termination,<br />

• configurable interleaver<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

FFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 8<br />

Cells<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 8<br />

SIF<br />

<strong>MAPLE</strong>-B Block Diagram<br />

SIF<br />

RISC 1<br />

Core<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

IRAM<br />

16kB<br />

SIF<br />

Interrupts<br />

Radix 2<br />

Cells<br />

Radix 3<br />

Cells<br />

PIC<br />

DFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 4<br />

Cells<br />

Radix 5<br />

Cells<br />

PSIF config<br />

CE<br />

Slave<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

TM


BD’s write/read <strong>and</strong><br />

debug<br />

by DSP core/host<br />

64b<br />

450 MHz<br />

PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

Data write/read<br />

by <strong>MAPLE</strong><br />

SIF<br />

2x 64b<br />

450 MHz<br />

PSIF : Programmable System Interface<br />

TVPE : Turbo/Viterbi Processing Engine<br />

FFTPE : FFT Processing Engine<br />

DFTPE : DFT Processing Engine<br />

Programmable System Interface Based on<br />

RISC Engines :<br />

System Local DMA/ IRAM RISC 0<br />

DMA Engine CRC PE x2 16kB<br />

• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />

Core<br />

• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />

• DMA capabilities, <strong>and</strong> high system BW support<br />

Arbitration <strong>and</strong> switching<br />

• Low level task control <strong>and</strong> split<br />

• CRC acceleration<br />

TVPE<br />

Turbo-Viterbi<br />

Processing Element :<br />

EXT MEM<br />

CD , NII, HO MEM<br />

• SIMD parallelism<br />

• Novel heuristics <strong>and</strong> Radix 4<br />

NIIL<br />

architecture<br />

EXTL<br />

CDL HOL<br />

• >10x the throughput <strong>of</strong> existent<br />

industry/competitor solutions<br />

VRE•<br />

Multi-st<strong>and</strong>ard DRE0 DRE1 capable: DRE2 DRE3<br />

• binary/duo,<br />

• tail-bit/trellis CTL termination,<br />

• configurable interleaver<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

FFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 8<br />

Cells<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 9<br />

SIF<br />

<strong>MAPLE</strong>-B Block Diagram<br />

SIF<br />

RISC 1<br />

Core<br />

Routing <strong>and</strong> Config<br />

IRAM<br />

16kB<br />

SIF<br />

Interrupts<br />

PIC<br />

DFTPE<br />

FFT/IFFT <strong>and</strong> DFT/iDFT<br />

Processing Elements:<br />

PSIF config<br />

CE<br />

Slave<br />

I/O Data<br />

Buffer<br />

• High throughput engines<br />

• Multi-Radix Radix 2 Radix 4<br />

Cells Cells<br />

implementation<br />

Radix 3 Radix 5<br />

• Novel precision Cellsh<strong>and</strong>ling<br />

Cells<br />

techniques<br />

• Power, area, performance<br />

Twiddles<br />

SBIF optimized vs. s<strong>of</strong>tware SBIF<br />

Memory<br />

implementations<br />

Routing <strong>and</strong> Config<br />

TM


BD’s write/read <strong>and</strong><br />

debug<br />

by DSP core/host<br />

64b<br />

450 MHz<br />

PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

Data write/read<br />

by <strong>MAPLE</strong><br />

SIF<br />

VRE<br />

2x 64b<br />

450 MHz<br />

System<br />

DMA Engine<br />

EXT MEM<br />

EXTL<br />

PSIF : Programmable System Interface<br />

TVPE : Turbo/Viterbi Processing Engine<br />

FFTPE : FFT Processing Engine<br />

DFTPE : DFT Processing Engine<br />

TVPE<br />

Local DMA/<br />

CRC PE x2<br />

CD , NII, HO MEM<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

FFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 8<br />

Cells<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 10<br />

CTL<br />

CDL<br />

IRAM<br />

16kB<br />

Arbitration <strong>and</strong> switching<br />

SIF<br />

NIIL<br />

HOL<br />

DRE0 DRE1 DRE2 DRE3<br />

<strong>MAPLE</strong>-B Block Diagram<br />

RISC 0<br />

Core<br />

SIF<br />

RISC 1<br />

Core<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

IRAM<br />

16kB<br />

SIF<br />

Interrupts<br />

Radix 2<br />

Cells<br />

Radix 3<br />

Cells<br />

PIC<br />

DFTPE<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

Radix 4<br />

Cells<br />

Radix 5<br />

Cells<br />

PSIF config<br />

CE<br />

Slave<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

TM


PSIF<br />

MAG2DRAM<br />

DRAM<br />

DATA<br />

SRAM<br />

16kB<br />

DATA<br />

SRAM<br />

16kB<br />

64b<br />

450MHz<br />

2x 64b<br />

450MHz<br />

System<br />

DMA Engine<br />

Local DMA/<br />

CRC PE x2<br />

IRAM<br />

16kB<br />

Arbitration <strong>and</strong> switching<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 11<br />

RISC 0<br />

Core<br />

RISC 1<br />

Core<br />

IRAM<br />

16kB<br />

PIC<br />

CE<br />

Slave<br />

PSIF overview<br />

• RISC based programmable system interface<br />

• <strong>Hardware</strong> scheduler<br />

• Firmware based buffer descriptors parsing <strong>and</strong> arbitration<br />

• DMA <strong>and</strong> DMA control for input/output data via two master interfaces<br />

• Direct access for BD’s placement by DSP cores or other hosts via fast slave interface<br />

• Local DMA for CRC acceleration <strong>and</strong> future extensions<br />

• Programmable interrupt controller<br />

• St<strong>and</strong>ard SRAM* interface to PE’s (TVPE, DFTPE, FFTPE)<br />

• Low level control <strong>and</strong> configuration <strong>of</strong> PE’s<br />

• Emulate system behavior <strong>of</strong> “Yet Another Slave DSP Core”<br />

TM


<strong>MAPLE</strong>-B Programming Model <strong>Overview</strong><br />

►Buffer Descriptor (BD) based programming model:<br />

• Up to eight high-priority BD rings <strong>and</strong> eight low-priority BD rings per each<br />

processing element for multiple master support – multicore awareness<br />

• <strong>MAPLE</strong>-B round robin with priority arbitration between jobs<br />

• 12 KB <strong>MAPLE</strong>-B internal memory dedicated for BD rings in internal memory<br />

• TaskID for every job in BD for debug/tracking purposes<br />

► Minimal overhead for DSP core<br />

• <strong>MAPLE</strong>-B reads input data using its embedded DMA from any system memory<br />

location: M2/L2/M3/DDR<br />

• <strong>MAPLE</strong>-B writes results to any system memory location: M2/L2/M3/DDR<br />

• Interrupts <strong>and</strong>/or BD polling comm<strong>and</strong> done indication to DSP cores<br />

• Supports direct Serial RapidIO® door-bell generation for job completion<br />

indication to external host sharing or controlling certain <strong>MAPLE</strong> BD rings<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 12<br />

TM


Core B High Priority TVPE Ring<br />

Descriptor<br />

External High Priority TVPE Ring Desc.<br />

Core B Low Priority TVPE Ring Descriptor<br />

Core A High Priority DFTPE Ring Desc.<br />

Core C Low Priority DFTPE Ring Desc.<br />

Ring Descriptors <strong>and</strong> Buffer Descriptor<br />

Core C Low Priority FFTPE Ring Desc.<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 13<br />

External<br />

TURBO<br />

Core C<br />

DFT, FFT<br />

<strong>MAPLE</strong>-B<br />

CoreB<br />

TURBO<br />

CoreA<br />

DFT<br />

•Located inside <strong>MAPLE</strong><br />

•Multiple priorities<br />

•Small h<strong>and</strong>ling fee for<br />

RISC processors<br />

TM


TVPE High<br />

TVPE Low<br />

DFTPE High<br />

DFTPE Low<br />

FFTPE High<br />

FFTPE Low<br />

Ring Descriptors <strong>and</strong> Buffer Descriptor<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 14<br />

External<br />

TURBO<br />

Core C<br />

DFT, FFT<br />

Up to 12 KB Total<br />

<strong>MAPLE</strong>-B<br />

CoreB<br />

TURBO<br />

CoreA<br />

DFT<br />

TM


Arbitration <strong>and</strong><br />

switching<br />

SIF<br />

64b<br />

450MHz<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

FFTPE<br />

Radix 8<br />

Cells<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 15<br />

FFTPE highlights<br />

► High throughput, low power FFT/iFFT transform<br />

processing element<br />

► Build from Radix2, Radix4 <strong>and</strong> Radix8 elements<br />

► Single, 64-bit PSIF interface for control <strong>and</strong> data<br />

► Support 128, 256, 512, 1024 <strong>and</strong> 2048 points<br />

transforms<br />

► 32-bit (16I, 16Q) input <strong>and</strong> output data<br />

► Internal twiddles ROM memory<br />

► Advanced scaling methods including:<br />

• User defined down scaling per stage <strong>of</strong> 0-4 bits<br />

• Adaptive scaling (block-floating emulation) with overall scaling<br />

option<br />

► Guard b<strong>and</strong>s <strong>and</strong> DC carrier insertion for iFFT<br />

optimization<br />

► Job (BD) repeat option for reduced configuration <strong>and</strong><br />

increased throughput with adjacent I/O data structures<br />

TM


Arbitration <strong>and</strong><br />

switching<br />

SIF<br />

64b<br />

450MHz<br />

Radix 2<br />

Cells<br />

Radix 4<br />

Cells<br />

I/O Data<br />

Buffer<br />

Twiddles<br />

Memory<br />

DFTPE<br />

Radix 3<br />

Cells<br />

Radix 5<br />

Cells<br />

Routing <strong>and</strong> Config<br />

SBIF<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 16<br />

DFTPE highlights<br />

► High throughput, low power FFT/iFFT, DFT/iDFT<br />

transform processing element<br />

► Build from Radix2, Radix3, Radix4 <strong>and</strong> Radix5<br />

elements<br />

► Single, 64-bit PSIF interface for control <strong>and</strong> data<br />

► Support 128, 256, 512, 1024 points FFT/iFFT<br />

transforms<br />

► Support 3GLTE st<strong>and</strong>ard DFT/iDFT transforms from<br />

12 to 1200 <strong>and</strong> 1536 points<br />

► 32-bit (16I, 16Q) input <strong>and</strong> output data<br />

► Internal twiddles ROM memory<br />

► Advanced scaling methods including:<br />

• User defined down scaling per stage <strong>of</strong> 0-4 bits<br />

• Adaptive scaling (block-floating emulation) with overall<br />

scaling option<br />

► Guard b<strong>and</strong>s <strong>and</strong> DC carrier insertion for iFFT<br />

optimization<br />

► Job (BD) repeat option for reduced configuration <strong>and</strong><br />

increased throughput with adjacent I/O data structures<br />

TM


Arbitration <strong>and</strong><br />

switching<br />

SIF<br />

VRE<br />

64b<br />

450 MHz<br />

EXT MEM<br />

EXTL<br />

TVPE<br />

CTL<br />

CD , NII, HO MEM<br />

CDL<br />

64b<br />

450 MHz<br />

SIF<br />

NIIL<br />

HOL<br />

DRE0 DRE1 DRE2 DRE3<br />

► High throughput, low power Turbo or Viterbi decoding<br />

► Multi-st<strong>and</strong>ard, multi-algorithm support via:<br />

• Binary <strong>and</strong> duo binary decoder<br />

• Tail-bit <strong>and</strong> zero tail trellis termination support<br />

• MaxLogMap <strong>and</strong> HybridLinearLogMap<br />

• Multi-iteration Viterbi decoding WAVA*<br />

► Dual, 64 bit PSIF interface for control <strong>and</strong> data<br />

► Radix4, NII-X architecture<br />

► 8-bit s<strong>of</strong>t LLR inputs <strong>and</strong> s<strong>of</strong>t/hard outputs<br />

► Rate-de-matching support for LTE, WCDMA, WiMAX<br />

► Periodic de-puncturing for CDMA2K <strong>and</strong> Viterbi<br />

► Various Input data structures to support trade-<strong>of</strong>f<br />

between DSP core pre-processing MIPS <strong>and</strong> Turbo<br />

decoding throughput<br />

► Support for APQ <strong>and</strong> CRC early stopping criterias<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 17<br />

TVPE highlights<br />

TM


<strong>MAPLE</strong>-B Performance <strong>and</strong> St<strong>and</strong>ards compliance<br />

WiMAX Systems <strong>MAPLE</strong>-B (MSC8156)<br />

Turbo decoding<br />

Optional support for sub-block de-interleaving<br />

Viterbi decoding<br />

Optional support for periodic de-puncturing<br />

FFT/IFFT<br />

Optional support for guard b<strong>and</strong>s insertion<br />

> 195 Mbps (6 iterations)<br />

> 100 Mbps (tail-biting multi-iteration)<br />

> 350 Msps using 2 units (FFTPE, DFTPE)<br />

CRC, insertion for DL <strong>and</strong> check for UL > 10 Gbps , CRC16 (PDU)<br />

3GLTE FDD/TDD Systems <strong>MAPLE</strong>-B (MSC8156)<br />

Turbo decoding<br />

Optional support for sub-block de-interleaving<br />

Viterbi decoding<br />

Optional support for periodic de-puncturing<br />

FFT/IFFT/DFT/IDFT<br />

Optional support for guard b<strong>and</strong>s insertion<br />

> 200 Mbps (6 iterations)<br />

> 100 Mbps (tail-biting multi-iteration)<br />

> 280 Msps FFT using FFTPE<br />

> 175 Msps DFT using DFTPE<br />

CRC, insertion for downlink <strong>and</strong> check for uplink > 10 Gbps , CRC24A, CRC24B<br />

UMTS – WCDMA, HSPA+ <strong>MAPLE</strong>-B (MSC8156)<br />

Turbo decoding > 165 Mbps (6 iterations)<br />

Viterbi decoding<br />

Optional support for periodic de-puncturing<br />

> 115 Mbps (zero tail, K=9)<br />

FFT/IFFT > 350 Msps FFT using FFTPE <strong>and</strong> DFTPE<br />

CRC, insertion for DL <strong>and</strong> check for UL > 10 Gbps , CRC24<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 18<br />

IEEE ® 802.16 Rev2<br />

3GPP TS 36.212 FEC <strong>and</strong> CRC<br />

3GPP TS 25.212 (FDD) FEC <strong>and</strong> CRC<br />

3GPP TS 25.222 (TDD) FEC <strong>and</strong> CRC<br />

TM


L2<br />

Scramble<br />

Control<br />

CQI<br />

De<br />

INTLV<br />

HARQ<br />

Rate De-<br />

Matching<br />

3GLTE PDSCH/DLSCH Acceleration using <strong>MAPLE</strong>-B<br />

CRC24a<br />

QAM<br />

Mapper<br />

De<br />

scramble<br />

Turbo<br />

Decoder<br />

PUSCH<br />

CB<br />

Segment.<br />

Layer<br />

Mapper<br />

CRC24b<br />

PDSCH<br />

DLSCH<br />

Example<br />

QAM<br />

De-Mapper<br />

CRC24b<br />

ULSCH<br />

MIMO<br />

Precoder<br />

IDFT<br />

TB<br />

assembly<br />

Turbo<br />

Encoder<br />

PRB<br />

Mapper<br />

MMSE<br />

Equalizer<br />

CRC24a<br />

HARQ<br />

Rate<br />

Matching<br />

DLRS<br />

Gen<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 19<br />

L2<br />

CE<br />

SNR<br />

GI, IFFT<br />

CP<br />

GR, FFT<br />

CPR<br />

RACH<br />

4<br />

(IF) RF<br />

Antenna<br />

(IF) RF<br />

Antenna<br />

<strong>MAPLE</strong>-B Offload HW<br />

<strong>MAPLE</strong>-B HW assist<br />

TM


<strong>MAPLE</strong>-B – Enablement Technology for Multi-St<strong>and</strong>ard Baseb<strong>and</strong><br />

►Advanced programming model<br />

• Buffer descriptors based job assignment<br />

• Multiple buffer descriptor rings for multicore system<br />

• System optimization via:<br />

Multi job assignment, advanced alignment<br />

Flexible interrupt assignment to any DSP core<br />

Embedded DMA with direct access to L2 cache or M2/M3/DDR<br />

Full <strong>of</strong>f-load <strong>of</strong> accelerators programming <strong>and</strong> control<br />

►High capacity processing elements (Coprocessors)<br />

• Optimized for low latency, high throughput system performance<br />

• Local to DSP FFT <strong>and</strong> DFT acceleration for:<br />

OFDMA/SC-FDMA processing<br />

Ranging/RACH acceleration<br />

Frequency domain processing acceleration for HSPA<br />

• 6144-bit LTE code block: ~40 usec decoding latency<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 20<br />

TM


<strong>SC3850</strong> DSP Core <strong>and</strong> Subsystem<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 21<br />

TM


• Memory protection<br />

• Prediction<br />

V2<br />

• MMU support<br />

• Additional ASI<br />

• Enhanced video<br />

• Dynamic branch<br />

prediction<br />

• Additional SIMD<br />

instructions<br />

• Up to 6-Issue VLIW Architecture<br />

• VLES<br />

• SIMD<br />

• Enhanced control<br />

code support<br />

• Dual MAC<br />

V3<br />

MSC8101/3, MSC8122/26/12/13, Wireless subscriber<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 22<br />

StarCore ® Architecture Roadmap<br />

V5<br />

V6<br />

8144/E/EC<br />

1 GHz <strong>and</strong><br />

beyond<br />

(90nm)<br />

MXC (2.5G, 3G, 3.5G)<br />

V7<br />

815x<br />

products…<br />

<strong>SC3850</strong> products…<br />

SC3400 products…<br />

SC140e products…<br />

SC1000 products…<br />

TM


Shared features<br />

Pipeline Stages<br />

(max freq @SOI)<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 23<br />

StarCore ® Feature Evolution<br />

SC140 (V2) SC140e (V3) SC3400 (V5) <strong>SC3850</strong> (V6)<br />

6 issue, statically scheduled VLES model: 4 DALU + 2 AGU<br />

128-bit instruction fetch, 2x64 data ports (3 memory accesses per cycle)<br />

Backward binary compatibility between all family members (16-bit basic inst. set)<br />

5<br />

(600 MHz @90G)<br />

5<br />

(250 MHz @90LP)<br />

12<br />

(1 Ghz @90SOI)<br />

12<br />

(1 Ghz @45SOI)<br />

Instructions Baseline Minor additions Video, SIMD2 Control ISA,<br />

Dual MPY<br />

Cache instructions<br />

Precise<br />

exceptions<br />

No No Yes Yes<br />

Privilege levels No Yes Yes Yes<br />

Micro-arch.<br />

Features<br />

1 VLES speculation 1 VLES speculation BTB, 4 VLES spec.,<br />

1 COF deep<br />

BTB, 4 VLES spec.,<br />

nested COF<br />

Platform M1 + L1 Icache MMU, L1 I/D cache L1, MMU, M2 L1, MMU, L2/M2<br />

TM


<strong>SC3850</strong> DSP Core Key Architectural Features<br />

►Statically scheduled VLIW<br />

• VLES model – Variable Length Execution Set<br />

• 6-issue: 4 DALU + 2 AGU + loop dispatched per cycle<br />

► Very high numerical throughput<br />

• Two 16x16 multipliers per Data ALU, eight total<br />

• Support for extended precision <strong>and</strong> complex multiplication<br />

• Four zero-overhead hardware loops<br />

• Application-specific instructions: FFT, complex algebra <strong>and</strong> more<br />

• 2x64-bit load/stores per cycle; Multivariable access to/from multiple registers<br />

with pack/unpack<br />

• Compact instructions perform multiple intrinsic functions (e.g. MAC, complex<br />

multiply, scale/saturate/round as part <strong>of</strong> the store)<br />

►Very good support for control code<br />

• Dynamic branch prediction (BTB), speculative execution<br />

• Fully predicated instruction set<br />

► Good OS support<br />

• Precise exceptions for MMU support, including during hardware loops<br />

• Dual stack pointer management in hardware<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 24<br />

TM


<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 25<br />

StarCore ® DSP Core Architecture<br />

►The StarCore core consists <strong>of</strong> the following main units:<br />

• Data arithmetic logic unit (DALU) that contains four instances <strong>of</strong> an<br />

arithmetic logic unit (ALU) <strong>and</strong> a data register file<br />

• Address generation unit (AGU) that contains two address arithmetic<br />

units (AAU) <strong>and</strong> an address register file<br />

• Program control unit (PCU)<br />

XP_ADDR<br />

BTB<br />

OCE<br />

32<br />

COF, Loop, INT<br />

(Debug)<br />

XP_DATA<br />

128<br />

Dispatcher<br />

PCU<br />

REG<br />

Instr. Bus<br />

XB_ADDR<br />

Memory Sub-System<br />

32<br />

XA_ADDR<br />

Address<br />

Registers<br />

32<br />

ALU0 ALU1<br />

XB_DATA<br />

64<br />

AGU DALU<br />

MAC0a<br />

MAC0b<br />

Logic0<br />

WRQ<br />

XA_DATA<br />

Data Registers<br />

MAC1a<br />

MAC1b<br />

Logic1<br />

RSU – Resource Stall Unit<br />

64<br />

MAC2a<br />

MAC2b<br />

Logic2<br />

MAC3a<br />

MAC3b<br />

Logic3<br />

TM


Dn<br />

Single MAC operation (SC140/SC3400)<br />

mac Da,Db,Dn<br />

Dn + (Da.H * Db.H) -> Dn<br />

Da<br />

Db<br />

High Portion Low Portion<br />

16-bit<br />

16-bit<br />

40-bit<br />

Dual MAC – SIMD2 MAC (<strong>SC3850</strong>)<br />

mac2 Da,Db,Dn<br />

Dn.WH + (Da.H * Db.H) -> Dn.WH<br />

Dn.WL + (Da.L * Db.L) -> Dn.WL<br />

Dn<br />

Da<br />

Db<br />

High Portion Low Portion<br />

16-bit<br />

16-bit<br />

16-bit<br />

16-bit<br />

20-bit 20-bit<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 26<br />

Dual Multiply ISA<br />

Dual MAC – double throughput MAC (<strong>SC3850</strong>)<br />

dmac Da,Db,Dn<br />

Dn + (Da.H * Db.H)+ (Da.L * Db.L -> Dn<br />

Dn<br />

Da<br />

Db<br />

High Portion Low Portion<br />

16-bit<br />

16-bit<br />

16-bit<br />

40-bit<br />

16-bit<br />

TM


<strong>SC3850</strong> DSP Core Data Processing Throughput<br />

►DALU calculations are based on 40-bit registers<br />

►The two multipliers <strong>of</strong> each ALU can be used in various ways:<br />

• SIMD2 or dot-product multiplication<br />

• Complex multiplication<br />

• Extended precision multiplication (16x32, 32x32)<br />

Operation Precision Operations per cycle<br />

Real Multiply<br />

Complex Multiply<br />

16x16 8<br />

16x32 4<br />

32x32 2<br />

16x16 2<br />

16x32 1<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 27<br />

Kernel <strong>SC3850</strong><br />

Real block FIR 16x16<br />

X%<br />

NT/8<br />

Complex FIR 16x16 NT/2<br />

Dot Product 16x16 N/4<br />

N: samples<br />

T: Taps<br />

TM


<strong>SC3850</strong> DSP Sub-System Features – Caches<br />

► Caches optimized to give best performance reducing TTM<br />

► L1 caches<br />

• Instructions <strong>and</strong> data caches both: 32 KB, 8 way<br />

Data cache supports write back allocate <strong>and</strong> write through policies<br />

• Advanced automatic pre-fetching:<br />

Line pre-fetch with critical word first <strong>and</strong> next line pre-fetch<br />

• S<strong>of</strong>tware-controlled pre-fetching with cache control instructions<br />

► L2/M2 memory system<br />

• 512 KB, configurable as L2 cache or M2 SRAM in 64 KB banks<br />

• M2 SRAM accessible by DMA<br />

• L2 cache: 8-ways, unified program<br />

<strong>and</strong> data<br />

• Programmable cache way partitioning<br />

according to address ranges<br />

• Low latency to the core (10-12 cycles)<br />

• S<strong>of</strong>tware-triggered DMA like pre-fetch<br />

channels operate in the background<br />

• DMA based “stashing” to DDRz<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 28<br />

TM


► L2 cache s<strong>of</strong>tware pre-fetch (SWPF), L1 DFETCH <strong>and</strong> PFETCH<br />

L2 SWPF <strong>of</strong><br />

code2 <strong>and</strong>/or<br />

data2<br />

<strong>SC3850</strong> DSP Sub-System Optimization Mechanisms<br />

PFETCH<br />

(code2)<br />

<strong>and</strong>/or<br />

DFETCH<br />

(data2)<br />

Task1(code1,data1)<br />

Execution<br />

L2 SWPF <strong>of</strong><br />

code3 <strong>and</strong>/or<br />

data3<br />

PFETCH<br />

(code3)<br />

<strong>and</strong>/or<br />

DFETCH<br />

(data3)<br />

Task2(code2,data2)<br />

Execution<br />

time<br />

Fetch<br />

“SW Pipeline”<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 29<br />

Task3(code3,data3)<br />

Execution<br />

Legend:<br />

Background fetch<br />

into L2 caches<br />

Inline fetch into L1<br />

caches<br />

In reality: Smaller<br />

<strong>and</strong> more frequent<br />

TM


DMA SW Model<br />

Cache vs. DMA Model in <strong>SC3850</strong> DSP Subsystem<br />

Mixed Model<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 30<br />

Cache SW model<br />

100% M2 L2 is partly M2 100% L2 + SWPF 100% L2<br />

• All in M2<br />

• Highest performance<br />

• High effort<br />

• Generate higher bus load<br />

• Expert mode – higher TTM<br />

Effort<br />

• Critical code/data in M2<br />

• Consider using L2 cache partitioning<br />

• High performance<br />

• Moderate-high effort<br />

Scheduled Cache<br />

SW model<br />

• All in DDR/M3<br />

• Use SWPF<br />

• Use L2 Cache partitioning<br />

• High performance<br />

• Moderate effort<br />

• All in DDR/M3<br />

• Good performance<br />

• Low effort<br />

TM


<strong>SC3850</strong> DSP Sub-System Features – Benefits <strong>of</strong> the MMU<br />

► Memory protection, translation <strong>and</strong> precise exceptions<br />

► Simpler, abstract s<strong>of</strong>tware model - not SoC specific<br />

► Good support for multicore devices<br />

• Code written once, unaware <strong>of</strong> the core it will actually run on<br />

• Specific memory allocated per channel instance, on a specific core<br />

► Easier debug, faster time to market<br />

• MMU errors quickly catch when a task accesses out <strong>of</strong> bounds<br />

• Virtual addressing allows simpler code re-use<br />

► Better MTBF (Mean Time Between Errors)<br />

Channels are isolated from each other<br />

<strong>and</strong> from system code<br />

• System code <strong>and</strong> privileged registers<br />

protected in supervisor level<br />

• An errant task will not bring down the<br />

whole system<br />

• Precise exceptions serviced before the<br />

error executes, allowing recovery in<br />

some cases<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 31<br />

TM


►Debug:<br />

<strong>SC3850</strong> DSP Sub-System Features – Debug <strong>and</strong> Pr<strong>of</strong>ile<br />

• Rich brakepoint capabilities<br />

• Cache aware debug<br />

• PC trace with task information<br />

• Remote debug capability<br />

►Pr<strong>of</strong>ile:<br />

• Performance optimization using<br />

detailed core stall information<br />

• Measuring RTOS <strong>and</strong> system<br />

overhead<br />

• Pr<strong>of</strong>iling at a function level<br />

• Constraint violation monitors<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 32<br />

Hold due to WRQ or<br />

WTB<br />

Hold due to WRQ<br />

Retry Mechanism<br />

Hold due to Dcache<br />

TM


CodeWarrior Developer Studio is a highly integrated<br />

toolchain providing the most comprehensive support <strong>of</strong><br />

<strong>Freescale</strong> DSPs built on StarCore ® technology.<br />

►Complete build <strong>and</strong> debug environment united in Eclipse IDE<br />

►Robust platform for development<br />

►Performance optimized StarCore DSP compiler<br />

►Multicore capabilities in every component<br />

• SmartDSP OS, IDE, SA, debugger, simulator<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 33<br />

TM


CodeWarrior Development Studio for StarCore ® v10.0<br />

A complete development environment under Eclipse<br />

► Eclipse IDE<br />

• Configuration Wizards<br />

• Plug-in architecture<br />

• Third party community<br />

► StarCore Build Tools<br />

• v23 performance C/C++ compiler<br />

• New linker<br />

Redesigned for usability<br />

Available for beta testing<br />

Old linker still included as default<br />

– New linker will become default in beta release<br />

► Simulation<br />

• Functional <strong>and</strong> cycle accurate<br />

► SmartDSP OS<br />

• Enhanced performance <strong>and</strong> networking<br />

• High speed data io via SmartDSP HEAT<br />

► StarCore Debugger<br />

• Multicore <strong>and</strong> multi-DSP<br />

• MSC8144 <strong>and</strong> MSC8156 targets<br />

► Trace <strong>and</strong> Pr<strong>of</strong>ile<br />

• Trace data <strong>of</strong>fload via Ethernet using<br />

SmartDSP HEAT technology<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 34<br />

TM


<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 35<br />

Summary<br />

►<strong>SC3850</strong> cores <strong>and</strong> sub-systems are optimized for baseb<strong>and</strong><br />

processing using advanced core architecture <strong>and</strong> high performance<br />

<strong>and</strong> flexible multi-level cache system<br />

►<strong>MAPLE</strong>-B provides innovative hardware acceleration platform for<br />

baseb<strong>and</strong> processing<br />

►<strong>SC3850</strong> DSP cores coupled with <strong>MAPLE</strong>-B acceleration in<br />

MSC8156 DSP processor provide unique combination <strong>of</strong> processing<br />

power <strong>and</strong> flexibility for current <strong>and</strong> future multi-st<strong>and</strong>ard base<br />

station designs<br />

TM


►Thank you for attending this presentation. We’ll now take a few<br />

moments for the audience’s questions <strong>and</strong> then we’ll begin the<br />

question <strong>and</strong> answer session.<br />

<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />

service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 36<br />

Q&A<br />

TM

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!