Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale
Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale
Overview of SC3850 and MAPLE-B Hardware Accelerator - Freescale
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
July, 2009<br />
<strong>MAPLE</strong> <strong>Hardware</strong> <strong>Accelerator</strong> <strong>and</strong><br />
<strong>SC3850</strong> DSP Core<br />
Enabling flexible multi-st<strong>and</strong>ard base-station designs<br />
Ron Bercovich<br />
DSP IP Manager<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009.<br />
Itay Peled<br />
Project Leader, DSP Platform Architecture<br />
TM
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 2<br />
Agenda<br />
►MSC8156 multicore digital signal processor (DSP) product using the<br />
<strong>SC3850</strong> DSP cores <strong>and</strong> <strong>MAPLE</strong>-B accelerators<br />
►<strong>MAPLE</strong>-B accelerators<br />
• Programmable accelerators concept<br />
• Programming model<br />
• Accelerated functions <strong>and</strong> st<strong>and</strong>ards compliance<br />
► <strong>SC3850</strong> DSP built on StarCore ® technology<br />
• Architecture overview<br />
• Performance<br />
• L1 <strong>and</strong> L2 cache sub-system<br />
►Summary<br />
TM
• Six <strong>SC3850</strong> Cores Subsystems (up to 6 GHz/48 GMACs)<br />
each with:<br />
• <strong>SC3850</strong> DSP core at up to 1 GHz (8 GMACs 16b or 8b)<br />
• 512 KB unified L2 cache / M2 memory<br />
• 32 KB I-cache, 32 KB D-cache, WBB, WTB, MMU, PIC<br />
• Internal/External Memories/Caches<br />
• 1056 KB M3 shared memory (SRAM)<br />
• Two DDR 2/3 64-bit SDRAM interfaces at up to 800 MHz<br />
• CLASS – Chip-Level Arbitration <strong>and</strong> Switching Fabric<br />
• Non-Blocking, fully pipelined, low latency<br />
• Full fabric 12 masters to eight slaves, up to 512 Gbps<br />
throughput<br />
• <strong>MAPLE</strong>-B – Baseb<strong>and</strong> <strong>Accelerator</strong><br />
• Turbo/Viterbi decoder up to 200/115 Mbps<br />
supporting: 3G-LTE, 802.16, 3G, CDMA2K st<strong>and</strong>ards<br />
• FFT <strong>and</strong> DFT accelerators up to 280 <strong>and</strong> 175 Msps<br />
• Multi-st<strong>and</strong>ard CRC check <strong>and</strong> insertion<br />
• Security Engine (Talitos 3.1)<br />
• Data <strong>and</strong> code protection (AES, SHA, Kasumi, SNOW3G)<br />
• High Speed Interconnects<br />
• Dual 4x/1x Serial RapidIO ® at 1.25/2.5/3.125 Gbaud<br />
• PCI Express ® 4x/1x<br />
• Dual RISC QUICCEngine Technology Supporting<br />
• Dual SGMII/RGMII Gigabit Ethernet ports<br />
• Eth. L1 Protocols, Talitos control <strong>and</strong> Serial RapidIO <strong>of</strong>fload<br />
• TDM Highway<br />
• 1024 ch., 400Mbps, divided into four ports <strong>of</strong> 256<br />
• DMA Engine 16 bi-directional channels w/ external req/ack<br />
• Eight <strong>Hardware</strong> Semaphores<br />
• Other Peripheral Interfaces<br />
• SPI, UART, I 2 C, 32 GPIO, 16 Timers, 96 KB boot ROM,<br />
JTAG/SAP, 8 WDT<br />
• Technology<br />
• 45 nm SOI, 1V core, 2.5, 1.8/1.5V I/O<br />
• FCBPGA (29x29) 1mm pitch, RoHS<br />
SerDes x4<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 3<br />
MSC8156E 45 nm Six-Core DSP<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
On-Chip Network<br />
2x SRIO 4x/1x,<br />
1x PCIe 4x/1x<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
6<br />
cores<br />
CLASS – Non-Blocking Switch Fabric<br />
SerDes x4<br />
2x Gigabit<br />
Ethernet, SPI<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
<strong>SC3850</strong> core<br />
32KB L1<br />
I-Cache<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
32KB L1<br />
D-Cache<br />
512KB Unified M2/L2<br />
TDM<br />
Highway<br />
4 ports<br />
Now Sampling<br />
DDR 2/3<br />
Memory<br />
Controller<br />
Security<br />
Processing<br />
Engine<br />
Shared<br />
Memory<br />
1056 KB<br />
DMA Engine<br />
DDR 2/3<br />
Memory<br />
Controller<br />
I²C, UART, GPIOs<br />
H/W Semaphores<br />
<strong>MAPLE</strong>-B<br />
Baseb<strong>and</strong><br />
<strong>Accelerator</strong><br />
TM
Multi-<strong>Accelerator</strong>-Platform for Baseb<strong>and</strong><br />
<strong>MAPLE</strong>-B<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 4<br />
TM
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 5<br />
<strong>MAPLE</strong>-B <strong>Accelerator</strong> <strong>Overview</strong><br />
►S<strong>of</strong>tware friendly buffer descriptor based h<strong>and</strong>shake <strong>and</strong> task assignment<br />
with minimal overhead on DSP cores for control<br />
►Highly flexible <strong>and</strong> programmable Turbo <strong>and</strong> Viterbi decoder supporting<br />
various configurable decoding parameters<br />
• High throughput Turbo decoding for low latency <strong>and</strong> advanced antenna systems<br />
or<br />
• Low latency multi-st<strong>and</strong>ard Viterbi decoding for data/control channels<br />
• Multi-st<strong>and</strong>ard capable: UMTS, CDMA2K, WiMAX <strong>and</strong> Long Term Evolution<br />
(LTE)<br />
• Flexible rate de-matching schemes for multiple st<strong>and</strong>ards, accelerating HARQ<br />
functionality<br />
► Flexible <strong>and</strong> advanced FFT/DFT acceleration:<br />
• FFT/iFFT for sizes 128, 256, 512, 1024, 2048 points<br />
• DFT/iDFT for LTE sizes<br />
► High speed CRC calculation/check accelerator for:<br />
• LTE code <strong>and</strong> transport block in UL <strong>and</strong> DL<br />
• WiMAX PHY Burst CRC in UL <strong>and</strong> DL<br />
TM
BD’s write/read <strong>and</strong><br />
debug<br />
by DSP core/host<br />
64b<br />
450 MHz<br />
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
Data write/read<br />
by <strong>MAPLE</strong><br />
SIF<br />
VRE<br />
2x 64b<br />
450 MHz<br />
System<br />
DMA Engine<br />
EXT MEM<br />
EXTL<br />
PSIF : Programmable System Interface<br />
TVPE : Turbo/Viterbi Processing Engine<br />
FFTPE : FFT Processing Engine<br />
DFTPE : DFT Processing Engine<br />
TVPE<br />
Local DMA/<br />
CRC PE x2<br />
CD , NII, HO MEM<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
FFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 8<br />
Cells<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 6<br />
CTL<br />
CDL<br />
IRAM<br />
16kB<br />
Arbitration <strong>and</strong> switching<br />
SIF<br />
NIIL<br />
HOL<br />
DRE0 DRE1 DRE2 DRE3<br />
<strong>MAPLE</strong>-B Block Diagram<br />
RISC 0<br />
Core<br />
SIF<br />
RISC 1<br />
Core<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
IRAM<br />
16kB<br />
SIF<br />
Interrupts<br />
Radix 2<br />
Cells<br />
Radix 3<br />
Cells<br />
PIC<br />
DFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 4<br />
Cells<br />
Radix 5<br />
Cells<br />
PSIF config<br />
CE<br />
Slave<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
TM
BD’s write/read <strong>and</strong><br />
debug<br />
by DSP core/host<br />
64b<br />
450 MHz<br />
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
Data write/read<br />
by <strong>MAPLE</strong><br />
SIF<br />
VRE<br />
2x 64b<br />
450 MHz<br />
PSIF : Programmable System Interface<br />
TVPE : Turbo/Viterbi Processing Engine<br />
FFTPE : FFT Processing Engine<br />
DFTPE : DFT Processing Engine<br />
Programmable System Interface Based on<br />
RISC Engines :<br />
System Local DMA/ IRAM RISC 0<br />
DMA Engine CRC PE x2 16kB<br />
• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />
Core<br />
• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />
• DMA capabilities, <strong>and</strong> high system BW support<br />
Arbitration <strong>and</strong> switching<br />
• Low level task control <strong>and</strong> split<br />
• CRC acceleration<br />
EXT MEM<br />
EXTL<br />
TVPE<br />
CD , NII, HO MEM<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
FFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 8<br />
Cells<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 7<br />
CTL<br />
CDL<br />
SIF<br />
NIIL<br />
HOL<br />
DRE0 DRE1 DRE2 DRE3<br />
<strong>MAPLE</strong>-B Block Diagram<br />
SIF<br />
RISC 1<br />
Core<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
IRAM<br />
16kB<br />
SIF<br />
Interrupts<br />
Radix 2<br />
Cells<br />
Radix 3<br />
Cells<br />
PIC<br />
DFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 4<br />
Cells<br />
Radix 5<br />
Cells<br />
PSIF config<br />
CE<br />
Slave<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
TM
BD’s write/read <strong>and</strong><br />
debug<br />
by DSP core/host<br />
64b<br />
450 MHz<br />
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
Data write/read<br />
by <strong>MAPLE</strong><br />
SIF<br />
2x 64b<br />
450 MHz<br />
PSIF : Programmable System Interface<br />
TVPE : Turbo/Viterbi Processing Engine<br />
FFTPE : FFT Processing Engine<br />
DFTPE : DFT Processing Engine<br />
Programmable System Interface Based on<br />
RISC Engines :<br />
System Local DMA/ IRAM RISC 0<br />
DMA Engine CRC PE x2 16kB<br />
• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />
Core<br />
• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />
• DMA capabilities, <strong>and</strong> high system BW support<br />
Arbitration <strong>and</strong> switching<br />
• Low level task control <strong>and</strong> split<br />
• CRC acceleration<br />
TVPE<br />
Turbo-Viterbi<br />
Processing Element :<br />
EXT MEM<br />
CD , NII, HO MEM<br />
• SIMD parallelism<br />
• Novel heuristics <strong>and</strong> Radix 4<br />
NIIL<br />
architecture<br />
EXTL<br />
CDL HOL<br />
• >10x the throughput <strong>of</strong> existent<br />
industry/competitor solutions<br />
VRE•<br />
Multi-st<strong>and</strong>ard DRE0 DRE1 capable: DRE2 DRE3<br />
• binary/duo,<br />
• tail-bit/trellis CTL termination,<br />
• configurable interleaver<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
FFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 8<br />
Cells<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 8<br />
SIF<br />
<strong>MAPLE</strong>-B Block Diagram<br />
SIF<br />
RISC 1<br />
Core<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
IRAM<br />
16kB<br />
SIF<br />
Interrupts<br />
Radix 2<br />
Cells<br />
Radix 3<br />
Cells<br />
PIC<br />
DFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 4<br />
Cells<br />
Radix 5<br />
Cells<br />
PSIF config<br />
CE<br />
Slave<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
TM
BD’s write/read <strong>and</strong><br />
debug<br />
by DSP core/host<br />
64b<br />
450 MHz<br />
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
Data write/read<br />
by <strong>MAPLE</strong><br />
SIF<br />
2x 64b<br />
450 MHz<br />
PSIF : Programmable System Interface<br />
TVPE : Turbo/Viterbi Processing Engine<br />
FFTPE : FFT Processing Engine<br />
DFTPE : DFT Processing Engine<br />
Programmable System Interface Based on<br />
RISC Engines :<br />
System Local DMA/ IRAM RISC 0<br />
DMA Engine CRC PE x2 16kB<br />
• Flexibility <strong>and</strong> st<strong>and</strong>ards adaption<br />
Core<br />
• Buffer descriptors parsing <strong>and</strong> system h<strong>and</strong>shake<br />
• DMA capabilities, <strong>and</strong> high system BW support<br />
Arbitration <strong>and</strong> switching<br />
• Low level task control <strong>and</strong> split<br />
• CRC acceleration<br />
TVPE<br />
Turbo-Viterbi<br />
Processing Element :<br />
EXT MEM<br />
CD , NII, HO MEM<br />
• SIMD parallelism<br />
• Novel heuristics <strong>and</strong> Radix 4<br />
NIIL<br />
architecture<br />
EXTL<br />
CDL HOL<br />
• >10x the throughput <strong>of</strong> existent<br />
industry/competitor solutions<br />
VRE•<br />
Multi-st<strong>and</strong>ard DRE0 DRE1 capable: DRE2 DRE3<br />
• binary/duo,<br />
• tail-bit/trellis CTL termination,<br />
• configurable interleaver<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
FFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 8<br />
Cells<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 9<br />
SIF<br />
<strong>MAPLE</strong>-B Block Diagram<br />
SIF<br />
RISC 1<br />
Core<br />
Routing <strong>and</strong> Config<br />
IRAM<br />
16kB<br />
SIF<br />
Interrupts<br />
PIC<br />
DFTPE<br />
FFT/IFFT <strong>and</strong> DFT/iDFT<br />
Processing Elements:<br />
PSIF config<br />
CE<br />
Slave<br />
I/O Data<br />
Buffer<br />
• High throughput engines<br />
• Multi-Radix Radix 2 Radix 4<br />
Cells Cells<br />
implementation<br />
Radix 3 Radix 5<br />
• Novel precision Cellsh<strong>and</strong>ling<br />
Cells<br />
techniques<br />
• Power, area, performance<br />
Twiddles<br />
SBIF optimized vs. s<strong>of</strong>tware SBIF<br />
Memory<br />
implementations<br />
Routing <strong>and</strong> Config<br />
TM
BD’s write/read <strong>and</strong><br />
debug<br />
by DSP core/host<br />
64b<br />
450 MHz<br />
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
Data write/read<br />
by <strong>MAPLE</strong><br />
SIF<br />
VRE<br />
2x 64b<br />
450 MHz<br />
System<br />
DMA Engine<br />
EXT MEM<br />
EXTL<br />
PSIF : Programmable System Interface<br />
TVPE : Turbo/Viterbi Processing Engine<br />
FFTPE : FFT Processing Engine<br />
DFTPE : DFT Processing Engine<br />
TVPE<br />
Local DMA/<br />
CRC PE x2<br />
CD , NII, HO MEM<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
FFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 8<br />
Cells<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 10<br />
CTL<br />
CDL<br />
IRAM<br />
16kB<br />
Arbitration <strong>and</strong> switching<br />
SIF<br />
NIIL<br />
HOL<br />
DRE0 DRE1 DRE2 DRE3<br />
<strong>MAPLE</strong>-B Block Diagram<br />
RISC 0<br />
Core<br />
SIF<br />
RISC 1<br />
Core<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
IRAM<br />
16kB<br />
SIF<br />
Interrupts<br />
Radix 2<br />
Cells<br />
Radix 3<br />
Cells<br />
PIC<br />
DFTPE<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
Radix 4<br />
Cells<br />
Radix 5<br />
Cells<br />
PSIF config<br />
CE<br />
Slave<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
TM
PSIF<br />
MAG2DRAM<br />
DRAM<br />
DATA<br />
SRAM<br />
16kB<br />
DATA<br />
SRAM<br />
16kB<br />
64b<br />
450MHz<br />
2x 64b<br />
450MHz<br />
System<br />
DMA Engine<br />
Local DMA/<br />
CRC PE x2<br />
IRAM<br />
16kB<br />
Arbitration <strong>and</strong> switching<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 11<br />
RISC 0<br />
Core<br />
RISC 1<br />
Core<br />
IRAM<br />
16kB<br />
PIC<br />
CE<br />
Slave<br />
PSIF overview<br />
• RISC based programmable system interface<br />
• <strong>Hardware</strong> scheduler<br />
• Firmware based buffer descriptors parsing <strong>and</strong> arbitration<br />
• DMA <strong>and</strong> DMA control for input/output data via two master interfaces<br />
• Direct access for BD’s placement by DSP cores or other hosts via fast slave interface<br />
• Local DMA for CRC acceleration <strong>and</strong> future extensions<br />
• Programmable interrupt controller<br />
• St<strong>and</strong>ard SRAM* interface to PE’s (TVPE, DFTPE, FFTPE)<br />
• Low level control <strong>and</strong> configuration <strong>of</strong> PE’s<br />
• Emulate system behavior <strong>of</strong> “Yet Another Slave DSP Core”<br />
TM
<strong>MAPLE</strong>-B Programming Model <strong>Overview</strong><br />
►Buffer Descriptor (BD) based programming model:<br />
• Up to eight high-priority BD rings <strong>and</strong> eight low-priority BD rings per each<br />
processing element for multiple master support – multicore awareness<br />
• <strong>MAPLE</strong>-B round robin with priority arbitration between jobs<br />
• 12 KB <strong>MAPLE</strong>-B internal memory dedicated for BD rings in internal memory<br />
• TaskID for every job in BD for debug/tracking purposes<br />
► Minimal overhead for DSP core<br />
• <strong>MAPLE</strong>-B reads input data using its embedded DMA from any system memory<br />
location: M2/L2/M3/DDR<br />
• <strong>MAPLE</strong>-B writes results to any system memory location: M2/L2/M3/DDR<br />
• Interrupts <strong>and</strong>/or BD polling comm<strong>and</strong> done indication to DSP cores<br />
• Supports direct Serial RapidIO® door-bell generation for job completion<br />
indication to external host sharing or controlling certain <strong>MAPLE</strong> BD rings<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 12<br />
TM
Core B High Priority TVPE Ring<br />
Descriptor<br />
External High Priority TVPE Ring Desc.<br />
Core B Low Priority TVPE Ring Descriptor<br />
Core A High Priority DFTPE Ring Desc.<br />
Core C Low Priority DFTPE Ring Desc.<br />
Ring Descriptors <strong>and</strong> Buffer Descriptor<br />
Core C Low Priority FFTPE Ring Desc.<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 13<br />
External<br />
TURBO<br />
Core C<br />
DFT, FFT<br />
<strong>MAPLE</strong>-B<br />
CoreB<br />
TURBO<br />
CoreA<br />
DFT<br />
•Located inside <strong>MAPLE</strong><br />
•Multiple priorities<br />
•Small h<strong>and</strong>ling fee for<br />
RISC processors<br />
TM
TVPE High<br />
TVPE Low<br />
DFTPE High<br />
DFTPE Low<br />
FFTPE High<br />
FFTPE Low<br />
Ring Descriptors <strong>and</strong> Buffer Descriptor<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 14<br />
External<br />
TURBO<br />
Core C<br />
DFT, FFT<br />
Up to 12 KB Total<br />
<strong>MAPLE</strong>-B<br />
CoreB<br />
TURBO<br />
CoreA<br />
DFT<br />
TM
Arbitration <strong>and</strong><br />
switching<br />
SIF<br />
64b<br />
450MHz<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
FFTPE<br />
Radix 8<br />
Cells<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 15<br />
FFTPE highlights<br />
► High throughput, low power FFT/iFFT transform<br />
processing element<br />
► Build from Radix2, Radix4 <strong>and</strong> Radix8 elements<br />
► Single, 64-bit PSIF interface for control <strong>and</strong> data<br />
► Support 128, 256, 512, 1024 <strong>and</strong> 2048 points<br />
transforms<br />
► 32-bit (16I, 16Q) input <strong>and</strong> output data<br />
► Internal twiddles ROM memory<br />
► Advanced scaling methods including:<br />
• User defined down scaling per stage <strong>of</strong> 0-4 bits<br />
• Adaptive scaling (block-floating emulation) with overall scaling<br />
option<br />
► Guard b<strong>and</strong>s <strong>and</strong> DC carrier insertion for iFFT<br />
optimization<br />
► Job (BD) repeat option for reduced configuration <strong>and</strong><br />
increased throughput with adjacent I/O data structures<br />
TM
Arbitration <strong>and</strong><br />
switching<br />
SIF<br />
64b<br />
450MHz<br />
Radix 2<br />
Cells<br />
Radix 4<br />
Cells<br />
I/O Data<br />
Buffer<br />
Twiddles<br />
Memory<br />
DFTPE<br />
Radix 3<br />
Cells<br />
Radix 5<br />
Cells<br />
Routing <strong>and</strong> Config<br />
SBIF<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 16<br />
DFTPE highlights<br />
► High throughput, low power FFT/iFFT, DFT/iDFT<br />
transform processing element<br />
► Build from Radix2, Radix3, Radix4 <strong>and</strong> Radix5<br />
elements<br />
► Single, 64-bit PSIF interface for control <strong>and</strong> data<br />
► Support 128, 256, 512, 1024 points FFT/iFFT<br />
transforms<br />
► Support 3GLTE st<strong>and</strong>ard DFT/iDFT transforms from<br />
12 to 1200 <strong>and</strong> 1536 points<br />
► 32-bit (16I, 16Q) input <strong>and</strong> output data<br />
► Internal twiddles ROM memory<br />
► Advanced scaling methods including:<br />
• User defined down scaling per stage <strong>of</strong> 0-4 bits<br />
• Adaptive scaling (block-floating emulation) with overall<br />
scaling option<br />
► Guard b<strong>and</strong>s <strong>and</strong> DC carrier insertion for iFFT<br />
optimization<br />
► Job (BD) repeat option for reduced configuration <strong>and</strong><br />
increased throughput with adjacent I/O data structures<br />
TM
Arbitration <strong>and</strong><br />
switching<br />
SIF<br />
VRE<br />
64b<br />
450 MHz<br />
EXT MEM<br />
EXTL<br />
TVPE<br />
CTL<br />
CD , NII, HO MEM<br />
CDL<br />
64b<br />
450 MHz<br />
SIF<br />
NIIL<br />
HOL<br />
DRE0 DRE1 DRE2 DRE3<br />
► High throughput, low power Turbo or Viterbi decoding<br />
► Multi-st<strong>and</strong>ard, multi-algorithm support via:<br />
• Binary <strong>and</strong> duo binary decoder<br />
• Tail-bit <strong>and</strong> zero tail trellis termination support<br />
• MaxLogMap <strong>and</strong> HybridLinearLogMap<br />
• Multi-iteration Viterbi decoding WAVA*<br />
► Dual, 64 bit PSIF interface for control <strong>and</strong> data<br />
► Radix4, NII-X architecture<br />
► 8-bit s<strong>of</strong>t LLR inputs <strong>and</strong> s<strong>of</strong>t/hard outputs<br />
► Rate-de-matching support for LTE, WCDMA, WiMAX<br />
► Periodic de-puncturing for CDMA2K <strong>and</strong> Viterbi<br />
► Various Input data structures to support trade-<strong>of</strong>f<br />
between DSP core pre-processing MIPS <strong>and</strong> Turbo<br />
decoding throughput<br />
► Support for APQ <strong>and</strong> CRC early stopping criterias<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 17<br />
TVPE highlights<br />
TM
<strong>MAPLE</strong>-B Performance <strong>and</strong> St<strong>and</strong>ards compliance<br />
WiMAX Systems <strong>MAPLE</strong>-B (MSC8156)<br />
Turbo decoding<br />
Optional support for sub-block de-interleaving<br />
Viterbi decoding<br />
Optional support for periodic de-puncturing<br />
FFT/IFFT<br />
Optional support for guard b<strong>and</strong>s insertion<br />
> 195 Mbps (6 iterations)<br />
> 100 Mbps (tail-biting multi-iteration)<br />
> 350 Msps using 2 units (FFTPE, DFTPE)<br />
CRC, insertion for DL <strong>and</strong> check for UL > 10 Gbps , CRC16 (PDU)<br />
3GLTE FDD/TDD Systems <strong>MAPLE</strong>-B (MSC8156)<br />
Turbo decoding<br />
Optional support for sub-block de-interleaving<br />
Viterbi decoding<br />
Optional support for periodic de-puncturing<br />
FFT/IFFT/DFT/IDFT<br />
Optional support for guard b<strong>and</strong>s insertion<br />
> 200 Mbps (6 iterations)<br />
> 100 Mbps (tail-biting multi-iteration)<br />
> 280 Msps FFT using FFTPE<br />
> 175 Msps DFT using DFTPE<br />
CRC, insertion for downlink <strong>and</strong> check for uplink > 10 Gbps , CRC24A, CRC24B<br />
UMTS – WCDMA, HSPA+ <strong>MAPLE</strong>-B (MSC8156)<br />
Turbo decoding > 165 Mbps (6 iterations)<br />
Viterbi decoding<br />
Optional support for periodic de-puncturing<br />
> 115 Mbps (zero tail, K=9)<br />
FFT/IFFT > 350 Msps FFT using FFTPE <strong>and</strong> DFTPE<br />
CRC, insertion for DL <strong>and</strong> check for UL > 10 Gbps , CRC24<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 18<br />
IEEE ® 802.16 Rev2<br />
3GPP TS 36.212 FEC <strong>and</strong> CRC<br />
3GPP TS 25.212 (FDD) FEC <strong>and</strong> CRC<br />
3GPP TS 25.222 (TDD) FEC <strong>and</strong> CRC<br />
TM
L2<br />
Scramble<br />
Control<br />
CQI<br />
De<br />
INTLV<br />
HARQ<br />
Rate De-<br />
Matching<br />
3GLTE PDSCH/DLSCH Acceleration using <strong>MAPLE</strong>-B<br />
CRC24a<br />
QAM<br />
Mapper<br />
De<br />
scramble<br />
Turbo<br />
Decoder<br />
PUSCH<br />
CB<br />
Segment.<br />
Layer<br />
Mapper<br />
CRC24b<br />
PDSCH<br />
DLSCH<br />
Example<br />
QAM<br />
De-Mapper<br />
CRC24b<br />
ULSCH<br />
MIMO<br />
Precoder<br />
IDFT<br />
TB<br />
assembly<br />
Turbo<br />
Encoder<br />
PRB<br />
Mapper<br />
MMSE<br />
Equalizer<br />
CRC24a<br />
HARQ<br />
Rate<br />
Matching<br />
DLRS<br />
Gen<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 19<br />
L2<br />
CE<br />
SNR<br />
GI, IFFT<br />
CP<br />
GR, FFT<br />
CPR<br />
RACH<br />
4<br />
(IF) RF<br />
Antenna<br />
(IF) RF<br />
Antenna<br />
<strong>MAPLE</strong>-B Offload HW<br />
<strong>MAPLE</strong>-B HW assist<br />
TM
<strong>MAPLE</strong>-B – Enablement Technology for Multi-St<strong>and</strong>ard Baseb<strong>and</strong><br />
►Advanced programming model<br />
• Buffer descriptors based job assignment<br />
• Multiple buffer descriptor rings for multicore system<br />
• System optimization via:<br />
Multi job assignment, advanced alignment<br />
Flexible interrupt assignment to any DSP core<br />
Embedded DMA with direct access to L2 cache or M2/M3/DDR<br />
Full <strong>of</strong>f-load <strong>of</strong> accelerators programming <strong>and</strong> control<br />
►High capacity processing elements (Coprocessors)<br />
• Optimized for low latency, high throughput system performance<br />
• Local to DSP FFT <strong>and</strong> DFT acceleration for:<br />
OFDMA/SC-FDMA processing<br />
Ranging/RACH acceleration<br />
Frequency domain processing acceleration for HSPA<br />
• 6144-bit LTE code block: ~40 usec decoding latency<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 20<br />
TM
<strong>SC3850</strong> DSP Core <strong>and</strong> Subsystem<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 21<br />
TM
• Memory protection<br />
• Prediction<br />
V2<br />
• MMU support<br />
• Additional ASI<br />
• Enhanced video<br />
• Dynamic branch<br />
prediction<br />
• Additional SIMD<br />
instructions<br />
• Up to 6-Issue VLIW Architecture<br />
• VLES<br />
• SIMD<br />
• Enhanced control<br />
code support<br />
• Dual MAC<br />
V3<br />
MSC8101/3, MSC8122/26/12/13, Wireless subscriber<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 22<br />
StarCore ® Architecture Roadmap<br />
V5<br />
V6<br />
8144/E/EC<br />
1 GHz <strong>and</strong><br />
beyond<br />
(90nm)<br />
MXC (2.5G, 3G, 3.5G)<br />
V7<br />
815x<br />
products…<br />
<strong>SC3850</strong> products…<br />
SC3400 products…<br />
SC140e products…<br />
SC1000 products…<br />
TM
Shared features<br />
Pipeline Stages<br />
(max freq @SOI)<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 23<br />
StarCore ® Feature Evolution<br />
SC140 (V2) SC140e (V3) SC3400 (V5) <strong>SC3850</strong> (V6)<br />
6 issue, statically scheduled VLES model: 4 DALU + 2 AGU<br />
128-bit instruction fetch, 2x64 data ports (3 memory accesses per cycle)<br />
Backward binary compatibility between all family members (16-bit basic inst. set)<br />
5<br />
(600 MHz @90G)<br />
5<br />
(250 MHz @90LP)<br />
12<br />
(1 Ghz @90SOI)<br />
12<br />
(1 Ghz @45SOI)<br />
Instructions Baseline Minor additions Video, SIMD2 Control ISA,<br />
Dual MPY<br />
Cache instructions<br />
Precise<br />
exceptions<br />
No No Yes Yes<br />
Privilege levels No Yes Yes Yes<br />
Micro-arch.<br />
Features<br />
1 VLES speculation 1 VLES speculation BTB, 4 VLES spec.,<br />
1 COF deep<br />
BTB, 4 VLES spec.,<br />
nested COF<br />
Platform M1 + L1 Icache MMU, L1 I/D cache L1, MMU, M2 L1, MMU, L2/M2<br />
TM
<strong>SC3850</strong> DSP Core Key Architectural Features<br />
►Statically scheduled VLIW<br />
• VLES model – Variable Length Execution Set<br />
• 6-issue: 4 DALU + 2 AGU + loop dispatched per cycle<br />
► Very high numerical throughput<br />
• Two 16x16 multipliers per Data ALU, eight total<br />
• Support for extended precision <strong>and</strong> complex multiplication<br />
• Four zero-overhead hardware loops<br />
• Application-specific instructions: FFT, complex algebra <strong>and</strong> more<br />
• 2x64-bit load/stores per cycle; Multivariable access to/from multiple registers<br />
with pack/unpack<br />
• Compact instructions perform multiple intrinsic functions (e.g. MAC, complex<br />
multiply, scale/saturate/round as part <strong>of</strong> the store)<br />
►Very good support for control code<br />
• Dynamic branch prediction (BTB), speculative execution<br />
• Fully predicated instruction set<br />
► Good OS support<br />
• Precise exceptions for MMU support, including during hardware loops<br />
• Dual stack pointer management in hardware<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 24<br />
TM
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 25<br />
StarCore ® DSP Core Architecture<br />
►The StarCore core consists <strong>of</strong> the following main units:<br />
• Data arithmetic logic unit (DALU) that contains four instances <strong>of</strong> an<br />
arithmetic logic unit (ALU) <strong>and</strong> a data register file<br />
• Address generation unit (AGU) that contains two address arithmetic<br />
units (AAU) <strong>and</strong> an address register file<br />
• Program control unit (PCU)<br />
XP_ADDR<br />
BTB<br />
OCE<br />
32<br />
COF, Loop, INT<br />
(Debug)<br />
XP_DATA<br />
128<br />
Dispatcher<br />
PCU<br />
REG<br />
Instr. Bus<br />
XB_ADDR<br />
Memory Sub-System<br />
32<br />
XA_ADDR<br />
Address<br />
Registers<br />
32<br />
ALU0 ALU1<br />
XB_DATA<br />
64<br />
AGU DALU<br />
MAC0a<br />
MAC0b<br />
Logic0<br />
WRQ<br />
XA_DATA<br />
Data Registers<br />
MAC1a<br />
MAC1b<br />
Logic1<br />
RSU – Resource Stall Unit<br />
64<br />
MAC2a<br />
MAC2b<br />
Logic2<br />
MAC3a<br />
MAC3b<br />
Logic3<br />
TM
Dn<br />
Single MAC operation (SC140/SC3400)<br />
mac Da,Db,Dn<br />
Dn + (Da.H * Db.H) -> Dn<br />
Da<br />
Db<br />
High Portion Low Portion<br />
16-bit<br />
16-bit<br />
40-bit<br />
Dual MAC – SIMD2 MAC (<strong>SC3850</strong>)<br />
mac2 Da,Db,Dn<br />
Dn.WH + (Da.H * Db.H) -> Dn.WH<br />
Dn.WL + (Da.L * Db.L) -> Dn.WL<br />
Dn<br />
Da<br />
Db<br />
High Portion Low Portion<br />
16-bit<br />
16-bit<br />
16-bit<br />
16-bit<br />
20-bit 20-bit<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 26<br />
Dual Multiply ISA<br />
Dual MAC – double throughput MAC (<strong>SC3850</strong>)<br />
dmac Da,Db,Dn<br />
Dn + (Da.H * Db.H)+ (Da.L * Db.L -> Dn<br />
Dn<br />
Da<br />
Db<br />
High Portion Low Portion<br />
16-bit<br />
16-bit<br />
16-bit<br />
40-bit<br />
16-bit<br />
TM
<strong>SC3850</strong> DSP Core Data Processing Throughput<br />
►DALU calculations are based on 40-bit registers<br />
►The two multipliers <strong>of</strong> each ALU can be used in various ways:<br />
• SIMD2 or dot-product multiplication<br />
• Complex multiplication<br />
• Extended precision multiplication (16x32, 32x32)<br />
Operation Precision Operations per cycle<br />
Real Multiply<br />
Complex Multiply<br />
16x16 8<br />
16x32 4<br />
32x32 2<br />
16x16 2<br />
16x32 1<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 27<br />
Kernel <strong>SC3850</strong><br />
Real block FIR 16x16<br />
X%<br />
NT/8<br />
Complex FIR 16x16 NT/2<br />
Dot Product 16x16 N/4<br />
N: samples<br />
T: Taps<br />
TM
<strong>SC3850</strong> DSP Sub-System Features – Caches<br />
► Caches optimized to give best performance reducing TTM<br />
► L1 caches<br />
• Instructions <strong>and</strong> data caches both: 32 KB, 8 way<br />
Data cache supports write back allocate <strong>and</strong> write through policies<br />
• Advanced automatic pre-fetching:<br />
Line pre-fetch with critical word first <strong>and</strong> next line pre-fetch<br />
• S<strong>of</strong>tware-controlled pre-fetching with cache control instructions<br />
► L2/M2 memory system<br />
• 512 KB, configurable as L2 cache or M2 SRAM in 64 KB banks<br />
• M2 SRAM accessible by DMA<br />
• L2 cache: 8-ways, unified program<br />
<strong>and</strong> data<br />
• Programmable cache way partitioning<br />
according to address ranges<br />
• Low latency to the core (10-12 cycles)<br />
• S<strong>of</strong>tware-triggered DMA like pre-fetch<br />
channels operate in the background<br />
• DMA based “stashing” to DDRz<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 28<br />
TM
► L2 cache s<strong>of</strong>tware pre-fetch (SWPF), L1 DFETCH <strong>and</strong> PFETCH<br />
L2 SWPF <strong>of</strong><br />
code2 <strong>and</strong>/or<br />
data2<br />
<strong>SC3850</strong> DSP Sub-System Optimization Mechanisms<br />
PFETCH<br />
(code2)<br />
<strong>and</strong>/or<br />
DFETCH<br />
(data2)<br />
Task1(code1,data1)<br />
Execution<br />
L2 SWPF <strong>of</strong><br />
code3 <strong>and</strong>/or<br />
data3<br />
PFETCH<br />
(code3)<br />
<strong>and</strong>/or<br />
DFETCH<br />
(data3)<br />
Task2(code2,data2)<br />
Execution<br />
time<br />
Fetch<br />
“SW Pipeline”<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 29<br />
Task3(code3,data3)<br />
Execution<br />
Legend:<br />
Background fetch<br />
into L2 caches<br />
Inline fetch into L1<br />
caches<br />
In reality: Smaller<br />
<strong>and</strong> more frequent<br />
TM
DMA SW Model<br />
Cache vs. DMA Model in <strong>SC3850</strong> DSP Subsystem<br />
Mixed Model<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 30<br />
Cache SW model<br />
100% M2 L2 is partly M2 100% L2 + SWPF 100% L2<br />
• All in M2<br />
• Highest performance<br />
• High effort<br />
• Generate higher bus load<br />
• Expert mode – higher TTM<br />
Effort<br />
• Critical code/data in M2<br />
• Consider using L2 cache partitioning<br />
• High performance<br />
• Moderate-high effort<br />
Scheduled Cache<br />
SW model<br />
• All in DDR/M3<br />
• Use SWPF<br />
• Use L2 Cache partitioning<br />
• High performance<br />
• Moderate effort<br />
• All in DDR/M3<br />
• Good performance<br />
• Low effort<br />
TM
<strong>SC3850</strong> DSP Sub-System Features – Benefits <strong>of</strong> the MMU<br />
► Memory protection, translation <strong>and</strong> precise exceptions<br />
► Simpler, abstract s<strong>of</strong>tware model - not SoC specific<br />
► Good support for multicore devices<br />
• Code written once, unaware <strong>of</strong> the core it will actually run on<br />
• Specific memory allocated per channel instance, on a specific core<br />
► Easier debug, faster time to market<br />
• MMU errors quickly catch when a task accesses out <strong>of</strong> bounds<br />
• Virtual addressing allows simpler code re-use<br />
► Better MTBF (Mean Time Between Errors)<br />
Channels are isolated from each other<br />
<strong>and</strong> from system code<br />
• System code <strong>and</strong> privileged registers<br />
protected in supervisor level<br />
• An errant task will not bring down the<br />
whole system<br />
• Precise exceptions serviced before the<br />
error executes, allowing recovery in<br />
some cases<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 31<br />
TM
►Debug:<br />
<strong>SC3850</strong> DSP Sub-System Features – Debug <strong>and</strong> Pr<strong>of</strong>ile<br />
• Rich brakepoint capabilities<br />
• Cache aware debug<br />
• PC trace with task information<br />
• Remote debug capability<br />
►Pr<strong>of</strong>ile:<br />
• Performance optimization using<br />
detailed core stall information<br />
• Measuring RTOS <strong>and</strong> system<br />
overhead<br />
• Pr<strong>of</strong>iling at a function level<br />
• Constraint violation monitors<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 32<br />
Hold due to WRQ or<br />
WTB<br />
Hold due to WRQ<br />
Retry Mechanism<br />
Hold due to Dcache<br />
TM
CodeWarrior Developer Studio is a highly integrated<br />
toolchain providing the most comprehensive support <strong>of</strong><br />
<strong>Freescale</strong> DSPs built on StarCore ® technology.<br />
►Complete build <strong>and</strong> debug environment united in Eclipse IDE<br />
►Robust platform for development<br />
►Performance optimized StarCore DSP compiler<br />
►Multicore capabilities in every component<br />
• SmartDSP OS, IDE, SA, debugger, simulator<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 33<br />
TM
CodeWarrior Development Studio for StarCore ® v10.0<br />
A complete development environment under Eclipse<br />
► Eclipse IDE<br />
• Configuration Wizards<br />
• Plug-in architecture<br />
• Third party community<br />
► StarCore Build Tools<br />
• v23 performance C/C++ compiler<br />
• New linker<br />
Redesigned for usability<br />
Available for beta testing<br />
Old linker still included as default<br />
– New linker will become default in beta release<br />
► Simulation<br />
• Functional <strong>and</strong> cycle accurate<br />
► SmartDSP OS<br />
• Enhanced performance <strong>and</strong> networking<br />
• High speed data io via SmartDSP HEAT<br />
► StarCore Debugger<br />
• Multicore <strong>and</strong> multi-DSP<br />
• MSC8144 <strong>and</strong> MSC8156 targets<br />
► Trace <strong>and</strong> Pr<strong>of</strong>ile<br />
• Trace data <strong>of</strong>fload via Ethernet using<br />
SmartDSP HEAT technology<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 34<br />
TM
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 35<br />
Summary<br />
►<strong>SC3850</strong> cores <strong>and</strong> sub-systems are optimized for baseb<strong>and</strong><br />
processing using advanced core architecture <strong>and</strong> high performance<br />
<strong>and</strong> flexible multi-level cache system<br />
►<strong>MAPLE</strong>-B provides innovative hardware acceleration platform for<br />
baseb<strong>and</strong> processing<br />
►<strong>SC3850</strong> DSP cores coupled with <strong>MAPLE</strong>-B acceleration in<br />
MSC8156 DSP processor provide unique combination <strong>of</strong> processing<br />
power <strong>and</strong> flexibility for current <strong>and</strong> future multi-st<strong>and</strong>ard base<br />
station designs<br />
TM
►Thank you for attending this presentation. We’ll now take a few<br />
moments for the audience’s questions <strong>and</strong> then we’ll begin the<br />
question <strong>and</strong> answer session.<br />
<strong>Freescale</strong> <strong>and</strong> the <strong>Freescale</strong> logo are trademarks <strong>of</strong> <strong>Freescale</strong> Semiconductor, Inc. All other product or<br />
service names are the property <strong>of</strong> their respective owners. © <strong>Freescale</strong> Semiconductor, Inc. 2009. 36<br />
Q&A<br />
TM