30.07.2013 Views

Dual-pipeline heterogeneous ASIP design - School of Computer ...

Dual-pipeline heterogeneous ASIP design - School of Computer ...

Dual-pipeline heterogeneous ASIP design - School of Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 9: 2-pipes .vs. 1-pipe<br />

ness <strong>of</strong> the created two-pipe processors, and our <strong>design</strong> a p<br />

proach, we also implemented those functions with single<br />

pipe <strong>ASIP</strong>s produced by <strong>ASIP</strong>Meister. The verification sys-<br />

tem for sin le pipe and two-pi e processors are shown in<br />

Figure 10. %,e that the sirnurated results are for the in-<br />

struction set after Phase I. All applications used the same<br />

sample data set for both single and parallel implementations.<br />

Figure 10: Synthesis/Simulation System<br />

the moeram in'terms <strong>of</strong> clock cvcles ICC,). We also o kin<br />

microns from AMI.<br />

The simulation results are shown in Table 2. The first col-<br />

umn gives the name <strong>of</strong> the application, the second and third<br />

the area cost <strong>of</strong> single and arallel implementations respec-<br />

tively, the fourth and the &th columns give the number <strong>of</strong><br />

clock cycles for execution <strong>of</strong> the application, and the sixth<br />

and the seventh gives the switching activity when the appli-<br />

cation was executed. Eight and ninth columns give the clock<br />

period for the processors <strong>design</strong>ed (results obtained from<br />

Synplify ASIC). Tenth, eleventh and the twelfth columns<br />

give the percentage increase in area, percentage speed im-<br />

provements, and percentage switching activity reductions.<br />

These three columns (10,ll and 12) are plotted in Figure 9.<br />

As can be seen from the table, there is up to 36% im rove-<br />

ment in speed. The speed improvement is at the cost orsome<br />

extra chip area. The two-instruction sets in each <strong>design</strong><br />

were created with very little overlap, i.e., few instructions<br />

were in both pipes. The clock eriod reductions came from<br />

the decreased controller compyexity. The extra area cost<br />

mainly comes from the second controller, additional func-<br />

tional blocks and data buses for parallel processin which<br />

is common to all dual-pi e <strong>design</strong>s. Therefore, tfe extra<br />

area cost is almost same t% all the <strong>design</strong>, which is around<br />

16%<br />

Switching activity reductions are obtained by reduced switch-<br />

ing in program counter and simplified functional circuitry.<br />

For exam le, two parallel add instructions can have less<br />

number <strong>of</strong>switches than the two instructions executed in<br />

a single pipe, since addition is implemented with an adder<br />

instead <strong>of</strong> an ALU in second pipe.<br />

17<br />

5. CONCLUSIONS<br />

In this paper we have described a system which expands<br />

the <strong>design</strong> space <strong>of</strong> <strong>ASIP</strong>s, by increasing the number <strong>of</strong><br />

<strong>pipeline</strong>s. We allocate load/store operations to one <strong>of</strong> the<br />

pipes, and in general ALU operation to the other pipe. If<br />

there are additional ALU operations which can be paral-<br />

lelized, then they are spread over the next pipe. We see<br />

speed improvements <strong>of</strong> up to 36% and switching activity<br />

reductions <strong>of</strong> up to 11%. The additional area costs approx-<br />

imately 16%.<br />

6. REFERENCES<br />

Arctangent processor. ARC International.<br />

(http://www.arc.com).<br />

Asip-meister. (http://www.~d~-m~i~t~~.~~g/asi~m~i~t~~/)<br />

Xtensa processor. Tensilica Inc. (http://www.tensilica.com)<br />

N. Binh. M. Imai. and Y. Takeuchi. A Derformance<br />

maximization dgbrithm to <strong>design</strong> =ips- under the constm.int <strong>of</strong><br />

chip area including ram and rom sine. In ASP-DAC, 1998.<br />

Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, and<br />

Nobuyuki Hikichi. A hardware/s<strong>of</strong>tware partitioning algorithm<br />

for desi "in ipelined asips with least gate counts. In Pmc. <strong>of</strong><br />

the 33r2 DdE pages 527-532. ACM Press, 1996.<br />

161 P. Brisk, A. Kaplan, R. Kastner, and M. Suriafiadeh.<br />

Instruction eneration and re ularit extraction for<br />

reconfigurade processorb. in &ASEX 2002.<br />

171 Hoon Chai, Ju Hwan Yi. Jong-Yeol Lee, In-Cheol Park, and<br />

Chong-Min Kyun Ex biting intellectual pro erties in asi<br />

desi ns for embehed fz s<strong>of</strong>tware. In Pmcee&~gs <strong>of</strong> the hth<br />

DA& pages 939-944. A M press, 1999.<br />

181 J.G. Cousin, M. Denoual, D. Saille, and 0. Sentieys. Fast asip<br />

synthesis and power estimation for ds ap lication. IEEE Sips<br />

2000 : Deszon and Imolementotmn. gct& 2000.<br />

191 Kayhan K &ai. An asip <strong>design</strong> methodology for embedded<br />

systems. In Pmceedzngs <strong>of</strong> the seventh internotionol workshop<br />

on Hardwove/so,fLware co<strong>design</strong>, pages 17-21. ACM Press,<br />

1999.<br />

Tilman Glokler and Heinrich Meyi. Power reduction for sips:<br />

A case study.<br />

M. Jacome, G. de Veciana, and C. Akturan. Resource<br />

constrained dataflow retimin heuristics for vliw asips. In<br />

Proceedings <strong>of</strong> the seventh BODES, pages 12-16. ACM Press,<br />

1uou<br />

Margarida F. Jacome, Gustavo de Veciana, and Viktor<br />

Lapinskii. Exploring performance trade<strong>of</strong>fs for clustered vliw<br />

asi s. In Proceedings <strong>of</strong> the 2000 ICCAD. pages 504-510.<br />

IEEE press. 2000.<br />

hl. K. Jain, L. Wehmeyer, S. Steinke, P. Marwedel, and<br />

M. Bal-akrishnan. Evaluating register file size in asip <strong>design</strong>. In<br />

CODES, 2001.<br />

R. Kastner, S. Ogrenci-Memik, E. Bozorgzadeh, and<br />

M. Sar-rafiadeh. Instruction eneration for hybrid<br />

reconfigurable systems. In IC!CAO, 2001.<br />

V. Kathail, shail Aditya, R. Schreiber, B. R. Rau, D. C.<br />

Cron-quist. and M. Sivaraman. Pico: Automatically <strong>design</strong>ing<br />

CUStom computers. in computer, 2002.<br />

Akira Kitajima, Makiko Itoh, Jun Sato, Akichika Shiomi,<br />

Yoshinori Takeuchi, and Masaharu Imai. Effectiveness <strong>of</strong> the<br />

asip <strong>design</strong> system eas iii in <strong>design</strong> <strong>of</strong> i dined rocessors. In<br />

Pmc. <strong>of</strong> the 2001 hP-DAC, pages 64&4. A& Press, 2001.<br />

S. Kobayashi, H. Mita, Y. Takeuchi, and M. Irnai. Design space<br />

exploration for dsp a lications using the asip development<br />

system peas-iii. I" ABPP, 2002.<br />

J. Lee, K. Choi, and N. Dutt. Efficient instruction encoding for<br />

automatic instmction set desifn <strong>of</strong> configurable asips. In<br />

ICCAD. znnz.<br />

~~ ~~~~<br />

A. Pyttel, A. Sedlmeier, and C. Veith. Pscp: a scalable parallel<br />

asip architecture for reactive systems. In Proceedings <strong>of</strong> DATE,<br />

.I oases 370-376. IEEE Comouter Societv. ", 1998.<br />

~~~<br />

James E. Smith. Decoupled access/execute computer<br />

architectures. ACM Trans. Comput. Syst., 2(4):289-308, 1984<br />

F. Sun, S. Ravi, A. Raghunathan, and N. Jha. Synthesis <strong>of</strong><br />

custom processors based on extenbible platforms. In ICCAD,<br />

nnnn<br />

J.-H. Yang, B.-W. Kim, et al. Metacore: an application specific<br />

dsp development system. In DAC, 1998.<br />

4. Zhao. B. Mesman, and T. Basten. Practical instruction set<br />

desi n and corn iler retargetability using statio resource<br />

mo& ~n DA~E, 2002.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!