30.07.2013 Views

Dual-pipeline heterogeneous ASIP design - School of Computer ...

Dual-pipeline heterogeneous ASIP design - School of Computer ...

Dual-pipeline heterogeneous ASIP design - School of Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 9: 2-pipes .vs. 1-pipe<br />

ness <strong>of</strong> the created two-pipe processors, and our <strong>design</strong> a p<br />

proach, we also implemented those functions with single<br />

pipe <strong>ASIP</strong>s produced by <strong>ASIP</strong>Meister. The verification sys-<br />

tem for sin le pipe and two-pi e processors are shown in<br />

Figure 10. %,e that the sirnurated results are for the in-<br />

struction set after Phase I. All applications used the same<br />

sample data set for both single and parallel implementations.<br />

Figure 10: Synthesis/Simulation System<br />

the moeram in'terms <strong>of</strong> clock cvcles ICC,). We also o kin<br />

microns from AMI.<br />

The simulation results are shown in Table 2. The first col-<br />

umn gives the name <strong>of</strong> the application, the second and third<br />

the area cost <strong>of</strong> single and arallel implementations respec-<br />

tively, the fourth and the &th columns give the number <strong>of</strong><br />

clock cycles for execution <strong>of</strong> the application, and the sixth<br />

and the seventh gives the switching activity when the appli-<br />

cation was executed. Eight and ninth columns give the clock<br />

period for the processors <strong>design</strong>ed (results obtained from<br />

Synplify ASIC). Tenth, eleventh and the twelfth columns<br />

give the percentage increase in area, percentage speed im-<br />

provements, and percentage switching activity reductions.<br />

These three columns (10,ll and 12) are plotted in Figure 9.<br />

As can be seen from the table, there is up to 36% im rove-<br />

ment in speed. The speed improvement is at the cost orsome<br />

extra chip area. The two-instruction sets in each <strong>design</strong><br />

were created with very little overlap, i.e., few instructions<br />

were in both pipes. The clock eriod reductions came from<br />

the decreased controller compyexity. The extra area cost<br />

mainly comes from the second controller, additional func-<br />

tional blocks and data buses for parallel processin which<br />

is common to all dual-pi e <strong>design</strong>s. Therefore, tfe extra<br />

area cost is almost same t% all the <strong>design</strong>, which is around<br />

16%<br />

Switching activity reductions are obtained by reduced switch-<br />

ing in program counter and simplified functional circuitry.<br />

For exam le, two parallel add instructions can have less<br />

number <strong>of</strong>switches than the two instructions executed in<br />

a single pipe, since addition is implemented with an adder<br />

instead <strong>of</strong> an ALU in second pipe.<br />

17<br />

5. CONCLUSIONS<br />

In this paper we have described a system which expands<br />

the <strong>design</strong> space <strong>of</strong> <strong>ASIP</strong>s, by increasing the number <strong>of</strong><br />

<strong>pipeline</strong>s. We allocate load/store operations to one <strong>of</strong> the<br />

pipes, and in general ALU operation to the other pipe. If<br />

there are additional ALU operations which can be paral-<br />

lelized, then they are spread over the next pipe. We see<br />

speed improvements <strong>of</strong> up to 36% and switching activity<br />

reductions <strong>of</strong> up to 11%. The additional area costs approx-<br />

imately 16%.<br />

6. REFERENCES<br />

Arctangent processor. ARC International.<br />

(http://www.arc.com).<br />

Asip-meister. (http://www.~d~-m~i~t~~.~~g/asi~m~i~t~~/)<br />

Xtensa processor. Tensilica Inc. (http://www.tensilica.com)<br />

N. Binh. M. Imai. and Y. Takeuchi. A Derformance<br />

maximization dgbrithm to <strong>design</strong> =ips- under the constm.int <strong>of</strong><br />

chip area including ram and rom sine. In ASP-DAC, 1998.<br />

Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, and<br />

Nobuyuki Hikichi. A hardware/s<strong>of</strong>tware partitioning algorithm<br />

for desi "in ipelined asips with least gate counts. In Pmc. <strong>of</strong><br />

the 33r2 DdE pages 527-532. ACM Press, 1996.<br />

161 P. Brisk, A. Kaplan, R. Kastner, and M. Suriafiadeh.<br />

Instruction eneration and re ularit extraction for<br />

reconfigurade processorb. in &ASEX 2002.<br />

171 Hoon Chai, Ju Hwan Yi. Jong-Yeol Lee, In-Cheol Park, and<br />

Chong-Min Kyun Ex biting intellectual pro erties in asi<br />

desi ns for embehed fz s<strong>of</strong>tware. In Pmcee&~gs <strong>of</strong> the hth<br />

DA& pages 939-944. A M press, 1999.<br />

181 J.G. Cousin, M. Denoual, D. Saille, and 0. Sentieys. Fast asip<br />

synthesis and power estimation for ds ap lication. IEEE Sips<br />

2000 : Deszon and Imolementotmn. gct& 2000.<br />

191 Kayhan K &ai. An asip <strong>design</strong> methodology for embedded<br />

systems. In Pmceedzngs <strong>of</strong> the seventh internotionol workshop<br />

on Hardwove/so,fLware co<strong>design</strong>, pages 17-21. ACM Press,<br />

1999.<br />

Tilman Glokler and Heinrich Meyi. Power reduction for sips:<br />

A case study.<br />

M. Jacome, G. de Veciana, and C. Akturan. Resource<br />

constrained dataflow retimin heuristics for vliw asips. In<br />

Proceedings <strong>of</strong> the seventh BODES, pages 12-16. ACM Press,<br />

1uou<br />

Margarida F. Jacome, Gustavo de Veciana, and Viktor<br />

Lapinskii. Exploring performance trade<strong>of</strong>fs for clustered vliw<br />

asi s. In Proceedings <strong>of</strong> the 2000 ICCAD. pages 504-510.<br />

IEEE press. 2000.<br />

hl. K. Jain, L. Wehmeyer, S. Steinke, P. Marwedel, and<br />

M. Bal-akrishnan. Evaluating register file size in asip <strong>design</strong>. In<br />

CODES, 2001.<br />

R. Kastner, S. Ogrenci-Memik, E. Bozorgzadeh, and<br />

M. Sar-rafiadeh. Instruction eneration for hybrid<br />

reconfigurable systems. In IC!CAO, 2001.<br />

V. Kathail, shail Aditya, R. Schreiber, B. R. Rau, D. C.<br />

Cron-quist. and M. Sivaraman. Pico: Automatically <strong>design</strong>ing<br />

CUStom computers. in computer, 2002.<br />

Akira Kitajima, Makiko Itoh, Jun Sato, Akichika Shiomi,<br />

Yoshinori Takeuchi, and Masaharu Imai. Effectiveness <strong>of</strong> the<br />

asip <strong>design</strong> system eas iii in <strong>design</strong> <strong>of</strong> i dined rocessors. In<br />

Pmc. <strong>of</strong> the 2001 hP-DAC, pages 64&4. A& Press, 2001.<br />

S. Kobayashi, H. Mita, Y. Takeuchi, and M. Irnai. Design space<br />

exploration for dsp a lications using the asip development<br />

system peas-iii. I" ABPP, 2002.<br />

J. Lee, K. Choi, and N. Dutt. Efficient instruction encoding for<br />

automatic instmction set desifn <strong>of</strong> configurable asips. In<br />

ICCAD. znnz.<br />

~~ ~~~~<br />

A. Pyttel, A. Sedlmeier, and C. Veith. Pscp: a scalable parallel<br />

asip architecture for reactive systems. In Proceedings <strong>of</strong> DATE,<br />

.I oases 370-376. IEEE Comouter Societv. ", 1998.<br />

~~~<br />

James E. Smith. Decoupled access/execute computer<br />

architectures. ACM Trans. Comput. Syst., 2(4):289-308, 1984<br />

F. Sun, S. Ravi, A. Raghunathan, and N. Jha. Synthesis <strong>of</strong><br />

custom processors based on extenbible platforms. In ICCAD,<br />

nnnn<br />

J.-H. Yang, B.-W. Kim, et al. Metacore: an application specific<br />

dsp development system. In DAC, 1998.<br />

4. Zhao. B. Mesman, and T. Basten. Practical instruction set<br />

desi n and corn iler retargetability using statio resource<br />

mo& ~n DA~E, 2002.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!