01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A Method for Accurate High-Level Performance Evaluation 205<br />

Fig. 4. Trace primitives for access<strong>in</strong>g a hardware accelerator<br />

abstracted model <strong>of</strong> the desired HW accelerator, which on request will simulate<br />

certa<strong>in</strong> functionality.<br />

Functional repartition<strong>in</strong>g can be performed by a substitution <strong>of</strong> certa<strong>in</strong> parts<br />

<strong>of</strong> the trace by new patterns for access<strong>in</strong>g the accelerator component. For this<br />

purpose, we have <strong>in</strong>troduced new trace primitives shown <strong>in</strong> Fig. 4. A WRITE HW<br />

primitive represents a transfer <strong>of</strong> <strong>in</strong>put data to the accelerator denoted by<br />

dev id. Upon reception <strong>of</strong> the data, the hardware accelerator simulates <strong>in</strong>ternal<br />

process<strong>in</strong>g by wait<strong>in</strong>g for a certa<strong>in</strong> amount <strong>of</strong> time configured by the user.<br />

On execution <strong>of</strong> a READ HW primitive, the CPU constantly polls the accelerator<br />

until the processed data becomes available, and cont<strong>in</strong>ues execut<strong>in</strong>g next trace<br />

primitives after the successful read operation. The amount <strong>of</strong> data that should<br />

be written to or read from the accelerator is specified by n bytes parameter.<br />

Although the functional data is not transferred to the accelerator, the designer<br />

can still <strong>in</strong>vestigate the impact <strong>of</strong> associated communication latencies on<br />

the application’s performance. In our framework, the abstracted model <strong>of</strong> the<br />

accelerator was designed <strong>in</strong> a way <strong>in</strong> which the component gets locked for a<br />

certa<strong>in</strong> CPU dur<strong>in</strong>g data process<strong>in</strong>g. Therefore, <strong>in</strong> MPSoC architectures other<br />

CPUs will be stalled on a simultaneous access to the component. Please note<br />

that the designer can specify more complex access patterns, depend<strong>in</strong>g on the<br />

type <strong>of</strong> the hardware accelerator.<br />

The process<strong>in</strong>g latency <strong>of</strong> the HW accelerator can be annotated from the<br />

data specifications if the peripheral’s implementation is already available. As an<br />

alternative, the proposed methodology can be used for sett<strong>in</strong>g the requirements<br />

for a not yet exist<strong>in</strong>g accelerator. In both cases, the designer is assisted <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g<br />

the optimal trade-<strong>of</strong>f between the accelerated execution <strong>of</strong> the function and the<br />

additional load on the on-chip communication <strong>in</strong>frastructure.<br />

4 Experimental Results<br />

In this section, we estimate accuracy <strong>of</strong> our trace generation method and demonstrate<br />

the proposed workflow for evaluation <strong>of</strong> alternative MPSoC architectures<br />

us<strong>in</strong>g traces.<br />

In the follow<strong>in</strong>g experiments, cycle accurate simulations were performed <strong>in</strong><br />

COMeT tool developed by VaST <strong>Systems</strong> [14]. The tool provides a library <strong>of</strong><br />

cycle-accurate CPU models that are widely used <strong>in</strong> <strong>in</strong>dustry, as well as models<br />

<strong>of</strong> on-chip buses, memories and other components. Thus, the designer can create<br />

a complete model <strong>of</strong> a system-on-chip which is capable <strong>of</strong> execut<strong>in</strong>g the target<br />

b<strong>in</strong>ary code.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!