Dual-pipeline heterogeneous ASIP design - School of Computer ...
Dual-pipeline heterogeneous ASIP design - School of Computer ...
Dual-pipeline heterogeneous ASIP design - School of Computer ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Figure 9: 2-pipes .vs. 1-pipe<br />
ness <strong>of</strong> the created two-pipe processors, and our <strong>design</strong> a p<br />
proach, we also implemented those functions with single<br />
pipe <strong>ASIP</strong>s produced by <strong>ASIP</strong>Meister. The verification sys-<br />
tem for sin le pipe and two-pi e processors are shown in<br />
Figure 10. %,e that the sirnurated results are for the in-<br />
struction set after Phase I. All applications used the same<br />
sample data set for both single and parallel implementations.<br />
Figure 10: Synthesis/Simulation System<br />
the moeram in'terms <strong>of</strong> clock cvcles ICC,). We also o kin<br />
microns from AMI.<br />
The simulation results are shown in Table 2. The first col-<br />
umn gives the name <strong>of</strong> the application, the second and third<br />
the area cost <strong>of</strong> single and arallel implementations respec-<br />
tively, the fourth and the &th columns give the number <strong>of</strong><br />
clock cycles for execution <strong>of</strong> the application, and the sixth<br />
and the seventh gives the switching activity when the appli-<br />
cation was executed. Eight and ninth columns give the clock<br />
period for the processors <strong>design</strong>ed (results obtained from<br />
Synplify ASIC). Tenth, eleventh and the twelfth columns<br />
give the percentage increase in area, percentage speed im-<br />
provements, and percentage switching activity reductions.<br />
These three columns (10,ll and 12) are plotted in Figure 9.<br />
As can be seen from the table, there is up to 36% im rove-<br />
ment in speed. The speed improvement is at the cost orsome<br />
extra chip area. The two-instruction sets in each <strong>design</strong><br />
were created with very little overlap, i.e., few instructions<br />
were in both pipes. The clock eriod reductions came from<br />
the decreased controller compyexity. The extra area cost<br />
mainly comes from the second controller, additional func-<br />
tional blocks and data buses for parallel processin which<br />
is common to all dual-pi e <strong>design</strong>s. Therefore, tfe extra<br />
area cost is almost same t% all the <strong>design</strong>, which is around<br />
16%<br />
Switching activity reductions are obtained by reduced switch-<br />
ing in program counter and simplified functional circuitry.<br />
For exam le, two parallel add instructions can have less<br />
number <strong>of</strong>switches than the two instructions executed in<br />
a single pipe, since addition is implemented with an adder<br />
instead <strong>of</strong> an ALU in second pipe.<br />
17<br />
5. CONCLUSIONS<br />
In this paper we have described a system which expands<br />
the <strong>design</strong> space <strong>of</strong> <strong>ASIP</strong>s, by increasing the number <strong>of</strong><br />
<strong>pipeline</strong>s. We allocate load/store operations to one <strong>of</strong> the<br />
pipes, and in general ALU operation to the other pipe. If<br />
there are additional ALU operations which can be paral-<br />
lelized, then they are spread over the next pipe. We see<br />
speed improvements <strong>of</strong> up to 36% and switching activity<br />
reductions <strong>of</strong> up to 11%. The additional area costs approx-<br />
imately 16%.<br />
6. REFERENCES<br />
Arctangent processor. ARC International.<br />
(http://www.arc.com).<br />
Asip-meister. (http://www.~d~-m~i~t~~.~~g/asi~m~i~t~~/)<br />
Xtensa processor. Tensilica Inc. (http://www.tensilica.com)<br />
N. Binh. M. Imai. and Y. Takeuchi. A Derformance<br />
maximization dgbrithm to <strong>design</strong> =ips- under the constm.int <strong>of</strong><br />
chip area including ram and rom sine. In ASP-DAC, 1998.<br />
Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, and<br />
Nobuyuki Hikichi. A hardware/s<strong>of</strong>tware partitioning algorithm<br />
for desi "in ipelined asips with least gate counts. In Pmc. <strong>of</strong><br />
the 33r2 DdE pages 527-532. ACM Press, 1996.<br />
161 P. Brisk, A. Kaplan, R. Kastner, and M. Suriafiadeh.<br />
Instruction eneration and re ularit extraction for<br />
reconfigurade processorb. in &ASEX 2002.<br />
171 Hoon Chai, Ju Hwan Yi. Jong-Yeol Lee, In-Cheol Park, and<br />
Chong-Min Kyun Ex biting intellectual pro erties in asi<br />
desi ns for embehed fz s<strong>of</strong>tware. In Pmcee&~gs <strong>of</strong> the hth<br />
DA& pages 939-944. A M press, 1999.<br />
181 J.G. Cousin, M. Denoual, D. Saille, and 0. Sentieys. Fast asip<br />
synthesis and power estimation for ds ap lication. IEEE Sips<br />
2000 : Deszon and Imolementotmn. gct& 2000.<br />
191 Kayhan K &ai. An asip <strong>design</strong> methodology for embedded<br />
systems. In Pmceedzngs <strong>of</strong> the seventh internotionol workshop<br />
on Hardwove/so,fLware co<strong>design</strong>, pages 17-21. ACM Press,<br />
1999.<br />
Tilman Glokler and Heinrich Meyi. Power reduction for sips:<br />
A case study.<br />
M. Jacome, G. de Veciana, and C. Akturan. Resource<br />
constrained dataflow retimin heuristics for vliw asips. In<br />
Proceedings <strong>of</strong> the seventh BODES, pages 12-16. ACM Press,<br />
1uou<br />
Margarida F. Jacome, Gustavo de Veciana, and Viktor<br />
Lapinskii. Exploring performance trade<strong>of</strong>fs for clustered vliw<br />
asi s. In Proceedings <strong>of</strong> the 2000 ICCAD. pages 504-510.<br />
IEEE press. 2000.<br />
hl. K. Jain, L. Wehmeyer, S. Steinke, P. Marwedel, and<br />
M. Bal-akrishnan. Evaluating register file size in asip <strong>design</strong>. In<br />
CODES, 2001.<br />
R. Kastner, S. Ogrenci-Memik, E. Bozorgzadeh, and<br />
M. Sar-rafiadeh. Instruction eneration for hybrid<br />
reconfigurable systems. In IC!CAO, 2001.<br />
V. Kathail, shail Aditya, R. Schreiber, B. R. Rau, D. C.<br />
Cron-quist. and M. Sivaraman. Pico: Automatically <strong>design</strong>ing<br />
CUStom computers. in computer, 2002.<br />
Akira Kitajima, Makiko Itoh, Jun Sato, Akichika Shiomi,<br />
Yoshinori Takeuchi, and Masaharu Imai. Effectiveness <strong>of</strong> the<br />
asip <strong>design</strong> system eas iii in <strong>design</strong> <strong>of</strong> i dined rocessors. In<br />
Pmc. <strong>of</strong> the 2001 hP-DAC, pages 64&4. A& Press, 2001.<br />
S. Kobayashi, H. Mita, Y. Takeuchi, and M. Irnai. Design space<br />
exploration for dsp a lications using the asip development<br />
system peas-iii. I" ABPP, 2002.<br />
J. Lee, K. Choi, and N. Dutt. Efficient instruction encoding for<br />
automatic instmction set desifn <strong>of</strong> configurable asips. In<br />
ICCAD. znnz.<br />
~~ ~~~~<br />
A. Pyttel, A. Sedlmeier, and C. Veith. Pscp: a scalable parallel<br />
asip architecture for reactive systems. In Proceedings <strong>of</strong> DATE,<br />
.I oases 370-376. IEEE Comouter Societv. ", 1998.<br />
~~~<br />
James E. Smith. Decoupled access/execute computer<br />
architectures. ACM Trans. Comput. Syst., 2(4):289-308, 1984<br />
F. Sun, S. Ravi, A. Raghunathan, and N. Jha. Synthesis <strong>of</strong><br />
custom processors based on extenbible platforms. In ICCAD,<br />
nnnn<br />
J.-H. Yang, B.-W. Kim, et al. Metacore: an application specific<br />
dsp development system. In DAC, 1998.<br />
4. Zhao. B. Mesman, and T. Basten. Practical instruction set<br />
desi n and corn iler retargetability using statio resource<br />
mo& ~n DA~E, 2002.