Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Introduction<br />
Statistics acceleration with the use of decorated operations solves the initial problem without the need <strong>for</strong><br />
locks. This drastically increases per<strong>for</strong>mance and lowers the software complexity, because all protection<br />
<strong>for</strong> issues related to locks can be removed.<br />
1 Introduction<br />
Currently, the industry is rapidly migrating to multicore solutions, largely due to the fact that easy<br />
single-core per<strong>for</strong>mance improvements using stronger or faster cores is coming to an end. Further<br />
improvements to code execution (instructions per cycle) introduce drastically more complex logic.<br />
Increasing the frequency is difficult, because power consumption increases to the power of two relative<br />
the frequency1 . Furthermore, higher frequency gives little additional per<strong>for</strong>mance due to the core/memory<br />
speed difference [1].<br />
Multicore solutions give a theoretically higher per<strong>for</strong>mance with low aggregate power consumption, but<br />
it is crucial that the hardware and software is designed to allow <strong>for</strong> efficient scaling. The most important<br />
hardware aspects are buses and memory interfaces; in this area, the concept of switch fabrics is replacing<br />
traditional buses. For example, <strong>Freescale</strong>’s <strong>P4</strong>080 communication processor, equipped with eight e500mc<br />
Power Architecture® cores, solves this problem by utilizing the CoreNet coherency fabric with nearly 1<br />
Tbps of internal memory bandwidth and dual DDR3 interfaces. The other aspect, software design, is<br />
difficult to solve on a general basis to allow efficient scaling. Amdahl’s law [2] describes the application<br />
speed-up relative to the number of cores and how well parallelized the software is. As shown in Figure 1,<br />
software that is largely sequential can never make efficient use of highly parallel architectures. There<strong>for</strong>e,<br />
it is critical to provide means to remove sequential sections.<br />
Figure 1 shows Amdahl’s law of scaling over multiple cores <strong>for</strong> different degrees of parallelized code. S<br />
marks the portion of sequential code.<br />
Figure 1. Amdahl’s Law<br />
1. Relation is P = CV 2 F, but higher frequency requires higher voltage and leakier processes.<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
2 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice