01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A Method for Accurate High-Level Performance Evaluation 203<br />

the impact <strong>of</strong> shared resources on performance <strong>of</strong> the application. In addition,<br />

the simulator can pr<strong>of</strong>ile the application traces allow<strong>in</strong>g the designer to identify<br />

possible options for HW/SW functional repartition<strong>in</strong>g. The repartition<strong>in</strong>g can<br />

be performed by a simple trace modification and analyzed <strong>in</strong> the simulator <strong>in</strong><br />

an iterative manner. The follow<strong>in</strong>g sections provide further details on each step<br />

<strong>of</strong> the proposed workflow.<br />

3.1 Generation <strong>of</strong> Traces<br />

In order to estimate the execution time <strong>of</strong> the target s<strong>of</strong>tware, cycle accurate<br />

CPU simulators typically conta<strong>in</strong> models <strong>of</strong> micro-architectural components,<br />

e.g., branch predictors or <strong>in</strong>struction pipel<strong>in</strong>es. Us<strong>in</strong>g a CPU simulator, the<br />

designer can observe the contents <strong>of</strong> CPU’s <strong>in</strong>ternal registers and memory addresses<br />

<strong>in</strong> load and store <strong>in</strong>structions. For the generation <strong>of</strong> traces, the order <strong>of</strong><br />

the <strong>in</strong>structions as well as their tim<strong>in</strong>g <strong>in</strong>formation is <strong>of</strong> the most importance.<br />

In a cycle accurate simulation, the executed <strong>in</strong>structions can be categorized<br />

<strong>in</strong>to two types:<br />

– Process<strong>in</strong>g <strong>in</strong>structions not result<strong>in</strong>g <strong>in</strong> a request on the bus;<br />

– Communication <strong>in</strong>structions that perform load and store operations.<br />

S<strong>in</strong>ce none <strong>of</strong> the process<strong>in</strong>g <strong>in</strong>structions <strong>in</strong>itiates memory accesses, a group<br />

<strong>of</strong> subsequent process<strong>in</strong>g <strong>in</strong>structions are translated to a DELAY trace primitive<br />

(Fig. 3). The trace generator calculates the overall execution time <strong>of</strong> this group<br />

and stores it as a parameter <strong>of</strong> the primitive. Correspond<strong>in</strong>gly, the communication<br />

<strong>in</strong>structions are translated to READ and WRITE trace primitives. Information<br />

on the dest<strong>in</strong>ation target <strong>of</strong> each transaction and the target memory address is<br />

stored as primitive’s parameters as well.<br />

In order to enable pr<strong>of</strong>il<strong>in</strong>g <strong>of</strong> the application trace, trace primitives are divided<br />

<strong>in</strong>to groups. Each group represents an execution <strong>of</strong> the <strong>in</strong>structions with<strong>in</strong><br />

a particular subrout<strong>in</strong>e <strong>of</strong> the target SW code. For each group, the trace generator<br />

identifies the subrout<strong>in</strong>es’ names us<strong>in</strong>g the symbol table <strong>of</strong> the target code<br />

Fig. 3. Generation <strong>of</strong> traces us<strong>in</strong>g a log file from a CPU cycle accurate simulation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!