15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

FIGURE 37.11 DES speedup using TIE.<br />

very expensive. Thus, small blocks can attain speedup by a greater factor than large blocks (where key<br />

changes are less frequent). The modified Xtensa can encrypt and decrypt data at the rate of 377 MB/s.<br />

The hardware cost of the TIE instructions is roughly0 4,500 equivalent (NAND2) gates (measured in a<br />

0.25-µm process technology). The reduced storage requirements of the application offset this hardware<br />

cost. In addition, the new TIE instructions did not increase the cycle time of the machine. DES is only<br />

one of the applications that can benefit from specialized hardware.<br />

Consumer Multimedia<br />

The EEMBC consumer benchmarks contain a representative sample of multimedia applications of interest<br />

today. A baseline configuration of Xtensa contains many features suitable for these applications. At<br />

200 MHz operation Xtensa delivers more than 11 times the performance of the reference processor (ST<br />

Microelectronics ST20C2 at 50 MHz). Performance is measured as the geometric mean of the relative<br />

number of iterations per second for each algorithm compared to the reference processor; however, when<br />

we added instructions for image filtering and color-space conversion (RGB to YIQ and RGB to CYMB)<br />

the average performance increased by 17X (193 times faster then the reference). An AMD K6-III+ at<br />

550 MHz, for comparison, is 34.2 times faster then the reference processor. The base configuration was<br />

optimized for 200 MHz operation in a 0.18-µm technology. The processr was configured with 16 KB<br />

two-way set associative caches, 256 KB local data RAM, 16-entry store buffer, and 32-bit multiplier. The<br />

total area of the processor was 57,600 NAND2-equivalent gates. The optimized TIE code cost an additional<br />

64,100 NAND2-equivalent gates.<br />

DSP Telecommunications<br />

The EEMBC “Telemark” benchmark suite includes many kernels representative of DSP applications. The<br />

performance of a base Xtensa processor in this suite is comparable to that of other 32-bit microprocessors<br />

(2.3 times faster than the reference). Performance was also measured as the geometric mean of the relative<br />

number of iterations per second for each algorithm compared to the reference processor (IDT 32334 –<br />

MIPS32 architecture at 100 MHz). Adding a fixed-point vector co-processor and a few more specialized<br />

instructions, the performance of Xtensa increases by 37X, or a speedup of 85.7X compared to the reference<br />

processor. The AMD K6-III+ at 550 MHz has a speedup of 8.7 compared to the reference, while a TI<br />

DSP (TMS320C6203) running hand-optimized code at 300 MHz has a 68.5 speedup compared to the<br />

reference processor. The base Xtensa configuration was also optimized for 200 MHz operation in 0.18-<br />

µm technology with 16 KB two-way set associative caches, and 16-entry write buffer. The vector coprocessor<br />

and new TIE instructions add 180,000 thousand NAND2-equivalent gates.<br />

© 2002 by CRC Press LLC<br />

Speedup<br />

80<br />

60<br />

40<br />

20<br />

0<br />

43<br />

DES Performance<br />

50<br />

72<br />

1024 64<br />

Block Size (Bytes)<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!