15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 39.47 ManArray architectural elements.<br />

Coprocessor Platform” describes how the ManArray architecture fulfills SoC application requirements, with<br />

focus on the implementation, compiler, and tools. “Performance Evaluation” presents performance results,<br />

and last subsection concludes the chapter section.<br />

The ManArray Thread Coprocessor Architecture<br />

In numerous application environments there is a need to significantly augment the signal processing capabilities<br />

of a MIPS, ARM, or other host processor. In addition, many applications require low power consumption<br />

at very high performance levels to accomplish the tasks of emerging applications, such as wireless<br />

LAN (i.e., 802.11a) for battery-powered Internet devices. The BOPS SoC cores provide streamlined coprocessor<br />

attachment to MIPS, ARM, or other hosts for this purpose. Through selectable parallelism, the<br />

ManArray SoC cores achieve high performance at low clock rates, which minimizes power requirements.<br />

The compiler or programmer can select from packed data, indirect VLIW, PE array SIMD, and multiple<br />

threaded forms of parallelism to provide the best product solution. Further, BOPS provides a complete<br />

solution by providing a comprehensive top-down design methodology for delivering the SoC solutions.<br />

The ManArray processor is an array processor using a sequence processor (SP) array controller and<br />

an array of distributed indirect VLIW processing elements (PEs) (see Fig. 39.47). By varying the number<br />

of PEs on a core, an embedded scalable design is achieved with each core using a single architecture.<br />

This embedded scalability makes it possible to develop multiple products that provide a linear increase<br />

in performance and maintain the same programming model by merely adding array processor elements<br />

as needed by the application. As the processing capability is increased, the memory-to-PE bandwidth is<br />

increased, and the system DMA bandwidth may be increased as well. Embedded scalability drastically<br />

reduces development costs for future products because it allows for a single BOPS software development<br />

kit (SDK) to support a wide range of products.<br />

In addition to the embedded scalability, ManArray cores are configurable in the number and type of<br />

cores included on a chip, instruction subsetting for application optimization, the sizes of each SP’s<br />

instruction memory, the distributed iVLIW memories, the PE/SP data memories, and the I/O buffers,<br />

selectable clock speed, choice of on-chip peripherals, and DMA bus bandwidth. The ManArray cores<br />

provide a lower cost, more optimized signal processing solution than reconfigurable processors designed<br />

using FPGA technology [5]. Multiple ManArray cores provide optimized scalable multiprocessing by<br />

including multiple BOPS cores on an SoC product. These multiple ManArray cores can be organized to<br />

provide data pipeline processing between SP/PE-array cores and the parallelization of sub-application<br />

tasks (thread parallelism) with a centralized host-based control to be described later in this chapter section.<br />

Generally speaking, the ManArray processor combines PEs in clusters that also contain a SP, uniquely<br />

merged into the PE array, and a cluster-switch, Fig. 39.47. The SP provides program control, contains<br />

the instruction and data address generation units, and dispatches instructions to the processor array. In<br />

this manner, the ManArray processor is designed for scalability with a single architecture definition and<br />

a common tool set. The processor and supporting tools are designed to optimize the needs of a SoC<br />

platform by allowing a designer to balance an application’s sequential control requirements with the<br />

application’s inherent data parallelism. This is accomplished by having a scalable architecture that begins<br />

© 2002 by CRC Press LLC<br />

SP<br />

iVLIW<br />

PE0<br />

iVLIW<br />

PE1<br />

Scalable DMA<br />

& Host I/O<br />

CS<br />

iVLIW<br />

PEn<br />

Memory<br />

MIPS, ARM, X86<br />

CS=Cluster Switch

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!