03.08.2013 Views

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Using Decorations in Software and Per<strong>for</strong>mance Results<br />

the need <strong>for</strong> the final operation to be executed in the CPC, that is, “Fire and Forget.” They are there<strong>for</strong>e be<br />

executed in one cycle or fewer 1 . The per<strong>for</strong>mance is comparably high relative to lock-based approaches<br />

(see Section 3, “Using Decorations in Software and Per<strong>for</strong>mance Results”).<br />

3 Using Decorations in Software and Per<strong>for</strong>mance<br />

Results<br />

<strong>Decorated</strong> operations are typically implemented with macros to simplify usage; alternatively, they could<br />

overload basic add/subtract functions <strong>for</strong> applicable programming language such as C++. In the following<br />

benchmark case, the operation is implemented in bare-metal directly on the <strong>P4</strong>080 without an underlying<br />

operating system. Seven of the eight cores are running bare-metal, whereas the last core is running Linux<br />

to simplify the boot process. However, the operating system configuration and at what level the decorated<br />

operations are implemented are not important, because they are executed on the same privilege level and<br />

have the same characteristics <strong>for</strong> the core as normal load/store operations. Tests are made both in<br />

single-core as well as multicore configurations.<br />

There are three required areas <strong>for</strong> data accessed by a decorated operation. First, a pointer to the data must<br />

be defined, as follows:<br />

volatile int32_t *decorated_counter = NULL;<br />

In the program code, allocate the data and set the value to a default state, in this case zero, as follows:<br />

decorated_counter=(int32_t *) stats_memalign(CACHE_LINE_SIZE,<br />

sizeof(int32_t));<br />

*decorated_counter = 0;<br />

Finally, the code makes use of the data by executing a decorated operation, as follows:<br />

decorated_notify_inc_32(decorated_counter);<br />

The typical use-case <strong>for</strong> decorated operations is to update a data structure that occurs relatively seldomly,<br />

approximately less than every hundred cycle. In this case, an update is executed in a single cycle, which<br />

is the same as it is <strong>for</strong> private data. For a lock-based update, the programmer gets roughly 35 cycles in the<br />

ideal single-core case. These tests were measured by reading the clock cycle timer, running the test,<br />

reading cycle timer again, and then removing a measured overhead <strong>for</strong> reading the timers. The overhead<br />

is at a stable 4 clock cycles:<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

decorated_notify_inc_32(decorated_counter);<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

Because locks use an SoC-wide atomic function, they are affected by other locks. For example, when one<br />

core runs the code (above) and the other cores wait at a different lock, the cycle count increases from<br />

roughly 35 cycles to about 200 cycles. When all cores operate on the same lock, there is additional cycle<br />

count increase. A synthetic use-case that is not typically found in real applications, but has general interest<br />

due to the extensive load it puts on the system, is to run a long loop of updates. This also allows <strong>for</strong><br />

1. The e500mc core is superscalar and can load and retire up to two instructions per cycle under certain conditions.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

6 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!