01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

204 R. Plyask<strong>in</strong> and A. Herkersdorf<br />

and program counter values obta<strong>in</strong>ed dur<strong>in</strong>g the cycle accurate simulation. The<br />

start and end positions <strong>of</strong> the groups are denoted us<strong>in</strong>g START F and STOP F<br />

primitives.<br />

In the f<strong>in</strong>al step <strong>of</strong> the trace generation, each primitive type is encoded us<strong>in</strong>g<br />

a unique identifier. Afterwards, the primitives are stored <strong>in</strong> a file <strong>in</strong> the b<strong>in</strong>ary<br />

form, thus, reduc<strong>in</strong>g the size <strong>of</strong> the generated trace.<br />

3.2 Trace Simulator<br />

Dur<strong>in</strong>g the trace simulation which is similar to [15], the black-box CPU models<br />

execute primitives <strong>of</strong> the trace files. The simulation time is advanced accord<strong>in</strong>g<br />

to the process<strong>in</strong>g latencies given <strong>in</strong> the delay primitives. The absolute SystemC<br />

time <strong>in</strong>terval which the CPU model waits is calculated as the annotated number<br />

<strong>of</strong> cycles multiplied with the value <strong>of</strong> the clock frequency <strong>of</strong> that CPU component.<br />

On execution <strong>of</strong> read or write primitives, the CPU performs a request to the<br />

cache component. Given a memory address tagged to the primitive, the cache<br />

model <strong>in</strong>dicates either a hit or a miss for the current transaction. In case <strong>of</strong><br />

a cache miss, the CPU issues a block<strong>in</strong>g TLM transaction on the shared arbitrated<br />

bus. The cache model used <strong>in</strong> the trace simulator is highly configurable.<br />

The user can configure cache associativity as well as various replacement policies.<br />

In contrast to computational latencies that are statically def<strong>in</strong>ed by the<br />

delay primitives, communication latencies are obta<strong>in</strong>ed dynamically at simulation<br />

runtime, depend<strong>in</strong>g on how shared resources, e.g. on-chip <strong>in</strong>terconnect, are<br />

utilized by other CPUs. Please note that neither read nor write primitives are<br />

annotated with the data that has to be transferred. S<strong>in</strong>ce the functionality <strong>of</strong> the<br />

memory module is abstracted to access latencies only, the actual data written<br />

or read from the memory is not required.<br />

On execution <strong>of</strong> START F primitives, the trace simulator starts accumulat<strong>in</strong>g<br />

the execution time <strong>of</strong> the annotated subrout<strong>in</strong>e until a STOP F primitive is<br />

reached. This <strong>in</strong>formation is used later for generation <strong>of</strong> the pr<strong>of</strong>il<strong>in</strong>g results. In<br />

addition to the pr<strong>of</strong>il<strong>in</strong>g capabilities, the trace simulator is able to measure the<br />

utilization <strong>of</strong> each SoC component. In order to do so, the simulator accumulates<br />

the busy time <strong>of</strong> the respective component. At the end <strong>of</strong> the simulation, the<br />

utilization is calculated as a ratio between the overall busy time and the total<br />

simulated time.<br />

3.3 Trace Modification<br />

Abstracted representation <strong>of</strong> the target application us<strong>in</strong>g traces allows rapid<br />

changes <strong>of</strong> the workload generated by a CPU. Us<strong>in</strong>g a simple text process<strong>in</strong>g tool,<br />

the application trace can be modified <strong>in</strong> an arbitrary way, thus, enabl<strong>in</strong>g faster<br />

design exploration cycles. Particularly, this could be useful when the designer<br />

wants to repartition the functionality between CPUs and HW accelerators and<br />

evaluate the result<strong>in</strong>g impact on the application performance. S<strong>in</strong>ce the trace<br />

simulation does not require transfers <strong>of</strong> functional data, the user can create an

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!