03.08.2013 Views

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction<br />

Statistics acceleration with the use of decorated operations solves the initial problem without the need <strong>for</strong><br />

locks. This drastically increases per<strong>for</strong>mance and lowers the software complexity, because all protection<br />

<strong>for</strong> issues related to locks can be removed.<br />

1 Introduction<br />

Currently, the industry is rapidly migrating to multicore solutions, largely due to the fact that easy<br />

single-core per<strong>for</strong>mance improvements using stronger or faster cores is coming to an end. Further<br />

improvements to code execution (instructions per cycle) introduce drastically more complex logic.<br />

Increasing the frequency is difficult, because power consumption increases to the power of two relative<br />

the frequency1 . Furthermore, higher frequency gives little additional per<strong>for</strong>mance due to the core/memory<br />

speed difference [1].<br />

Multicore solutions give a theoretically higher per<strong>for</strong>mance with low aggregate power consumption, but<br />

it is crucial that the hardware and software is designed to allow <strong>for</strong> efficient scaling. The most important<br />

hardware aspects are buses and memory interfaces; in this area, the concept of switch fabrics is replacing<br />

traditional buses. For example, <strong>Freescale</strong>’s <strong>P4</strong>080 communication processor, equipped with eight e500mc<br />

Power Architecture® cores, solves this problem by utilizing the CoreNet coherency fabric with nearly 1<br />

Tbps of internal memory bandwidth and dual DDR3 interfaces. The other aspect, software design, is<br />

difficult to solve on a general basis to allow efficient scaling. Amdahl’s law [2] describes the application<br />

speed-up relative to the number of cores and how well parallelized the software is. As shown in Figure 1,<br />

software that is largely sequential can never make efficient use of highly parallel architectures. There<strong>for</strong>e,<br />

it is critical to provide means to remove sequential sections.<br />

Figure 1 shows Amdahl’s law of scaling over multiple cores <strong>for</strong> different degrees of parallelized code. S<br />

marks the portion of sequential code.<br />

Figure 1. Amdahl’s Law<br />

1. Relation is P = CV 2 F, but higher frequency requires higher voltage and leakier processes.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

2 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!