Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Freescale</strong> Semiconductor<br />
Application Note<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong><br />
<strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong><br />
by Networking and Multimedia Group<br />
<strong>Freescale</strong> Semiconductor, Inc.<br />
Austin, TX<br />
This application note presents the concept of “statistics<br />
acceleration” implemented with the usage of decorated<br />
operations.<br />
The important tasks of statistics gathering and logging of<br />
ongoing activity within an embedded system often consumes<br />
a substantial amount of cycles. This can be seen in<br />
applications such as computer vision and vehicle control, as<br />
well as network and telecom infrastructure. In the latter case,<br />
individual flows of data must be tracked, and this<br />
in<strong>for</strong>mation is then used to find errors and tune the network<br />
to optimal per<strong>for</strong>mance. Although this is an important<br />
functionality, it is essential that statistics handling take as<br />
little time as possible.<br />
With the current multicore trend, time-efficient statistics<br />
handling is becoming more difficult. This is due to data<br />
structures with statistics or other key parameters being<br />
shared between the cores, and locks must be put around them<br />
to prevent race-conditions. This can cause well-known<br />
problems, such as dead-locks, live-locks, and priority<br />
inversion, and an even higher layer of complexity must be<br />
introduced.<br />
© 2010 <strong>Freescale</strong> Semiconductor, Inc. All rights reserved.<br />
<strong>Freescale</strong> Confidential Proprietary<br />
Preliminary—Subject to Change Without Notice<br />
Document Number: AN4181<br />
Rev. A, 07/2010<br />
Contents<br />
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />
2. Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong> . . . 4<br />
3. Using Decorations in Software<br />
and Per<strong>for</strong>mance Results . . . . . . . . . . . . . . . . . . . . . . 6<br />
4. Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 7<br />
5. Sample Application . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />
6. <strong>Decorated</strong> Macro Functions . . . . . . . . . . . . . . . . . . . 13<br />
7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />
9. Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Introduction<br />
Statistics acceleration with the use of decorated operations solves the initial problem without the need <strong>for</strong><br />
locks. This drastically increases per<strong>for</strong>mance and lowers the software complexity, because all protection<br />
<strong>for</strong> issues related to locks can be removed.<br />
1 Introduction<br />
Currently, the industry is rapidly migrating to multicore solutions, largely due to the fact that easy<br />
single-core per<strong>for</strong>mance improvements using stronger or faster cores is coming to an end. Further<br />
improvements to code execution (instructions per cycle) introduce drastically more complex logic.<br />
Increasing the frequency is difficult, because power consumption increases to the power of two relative<br />
the frequency1 . Furthermore, higher frequency gives little additional per<strong>for</strong>mance due to the core/memory<br />
speed difference [1].<br />
Multicore solutions give a theoretically higher per<strong>for</strong>mance with low aggregate power consumption, but<br />
it is crucial that the hardware and software is designed to allow <strong>for</strong> efficient scaling. The most important<br />
hardware aspects are buses and memory interfaces; in this area, the concept of switch fabrics is replacing<br />
traditional buses. For example, <strong>Freescale</strong>’s <strong>P4</strong>080 communication processor, equipped with eight e500mc<br />
Power Architecture® cores, solves this problem by utilizing the CoreNet coherency fabric with nearly 1<br />
Tbps of internal memory bandwidth and dual DDR3 interfaces. The other aspect, software design, is<br />
difficult to solve on a general basis to allow efficient scaling. Amdahl’s law [2] describes the application<br />
speed-up relative to the number of cores and how well parallelized the software is. As shown in Figure 1,<br />
software that is largely sequential can never make efficient use of highly parallel architectures. There<strong>for</strong>e,<br />
it is critical to provide means to remove sequential sections.<br />
Figure 1 shows Amdahl’s law of scaling over multiple cores <strong>for</strong> different degrees of parallelized code. S<br />
marks the portion of sequential code.<br />
Figure 1. Amdahl’s Law<br />
1. Relation is P = CV 2 F, but higher frequency requires higher voltage and leakier processes.<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
2 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Introduction<br />
Single-core applications typically work by reading data from a data structure, computing a result, and then<br />
writing that back to the data structure. When this application is parallelized, the same computation is done,<br />
but it is now important to protect the data structure so that it is not incorrectly updated due to the<br />
concurrency.<br />
Take bank software as an example. Mr Foo’s account currently holds $1000 and is now accessed by two<br />
processes at nearly the same time. One inserts $500 and the other deducts $20. Both processes read the<br />
current statement, add or remove money, and finally write the result back the account statement. If the read<br />
operations took place be<strong>for</strong>e either write, then the final result is the same as the last write operation, and<br />
the other operation will not come through. Mr. Foo will be left with either the full $1500 or only $980 in<br />
his account.<br />
Figure 2 shows Mr. Foo’s bank account. The final value of the account is non-deterministic.<br />
Figure 2. Mr. Foo’s Bank Account<br />
This type of issue is called a race condition, and its traditional solution is to introduce a lock on the data<br />
structure. However, locks are sequential by nature and lower the degree of parallelism. A lock is typically<br />
implemented as follows:<br />
Test/Set to get lock <strong>for</strong> structure<br />
If (lock was already used by other) then try above again<br />
Release lock<br />
Read and update structure<br />
Locks also introduce additional software complexity, because if priority inversion [3] and dead-lock<br />
situations [4] occur, they must be handled. This in turn can lead to live-locks [5]. To conclude, the<br />
traditional method of handling synchronization by using locks is not robust because it does not allow <strong>for</strong><br />
efficient scaling. It is also very costly in terms of cycles. In a benchmark running bare-board on the <strong>P4</strong>080<br />
and utilizing <strong>Freescale</strong>’s light-weight executive (LWE) library, a shared variable protected by locks took<br />
nearly 25 times as long to update compared to a private variable 1 due to the lock overhead alone. In the<br />
1. Declared with “volatile” to ensure that no unfair compiler optimizations were used, and that a full read-update-write cycle was<br />
executed<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 3<br />
Preliminary—Subject to Change Without Notice
Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />
case where seven cores tried to update the same shared variable multiple times and hence also had to wait<br />
<strong>for</strong> the lock to be freed, the update took on average 95 times longer than the private variable. See Section 3,<br />
“Using Decorations in Software and Per<strong>for</strong>mance Results,” <strong>for</strong> urther details on the benchmark.<br />
<strong>Freescale</strong>’s implementation of decorated operations goes back to the initial, most basic problem, namely,<br />
how to share data in a multicore environment. <strong>Freescale</strong>’s decorated operations solve this problem by<br />
means that do not introduce sequential code and have a low software overhead and complexity. This is<br />
done by using other parts of the SoC besides the cores to per<strong>for</strong>m relative operations on data, such as<br />
“increase x by 10.” By moving the operation out from the core into a central location, it is possible to<br />
guarantee atomic operations and simpler inter-core order of execution.<br />
We will discuss how <strong>Decorated</strong> <strong>Operations</strong> are implemented in <strong>Freescale</strong>’s high-end <strong>QorIQ</strong> processor<br />
family with the <strong>P3</strong>, <strong>P4</strong> and <strong>P5</strong> devices and how these operations can be made use of by software in order<br />
to reach per<strong>for</strong>mance numbers that are equal to operations on private variables. Use-cases and examples<br />
will mainly be taken from the <strong>P4</strong>080 but is generally applicable <strong>for</strong> all the <strong>P3</strong>, <strong>P4</strong> and <strong>P5</strong> devices.<br />
2 Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />
<strong>Decorated</strong> operations, or decorated storage, as it is also commonly named, is a set of core instructions<br />
added to the Power Architecture instruction set [6]. The instructions are “decorated” with a computation<br />
and attribute to the common load/store instruction. For example, the stbx (Store Byte Indexed) now also<br />
has a decorated version: stbdx (Store Byte <strong>Decorated</strong> Indexed). This is also applied to half-word, word,<br />
double-word, and double-float versions of the store as well as load instructions (that is, stbdx, sthdx,<br />
stwdx, stddx 1 , and stfddx, and lbdx, lhdx, lwdx, lddx 2 , and lfddx). An additional dsn (Notify) instruction<br />
has also been added that does not have any corresponding load/store version, but is interpreted as a nop<br />
(No Operation by the core) and carries a decoration.<br />
This decoration does not have any direct meaning to the core itself, but depending on the SoC<br />
implementation, it is interpreted by other parts of the device. In the case of <strong>Freescale</strong>’s <strong>QorIQ</strong> processor<br />
family, these decorations are interpreted by the CPC, which carries out the operations together with the<br />
CoreNet DDR queue and DDR controller (see Figure 3). These act similarly to transactional memory [7]<br />
to per<strong>for</strong>m operations on a global scale outside the cores. Unlike transactional memory, there is no need to<br />
handle rollbacks, because CoreNet buffer transactions are required and ensure the correct order of<br />
execution. The decorations <strong>for</strong> load instructions include clear, set, decrement, and increment of data. Store<br />
instructions include accumulate (could be negative), combined increment and accumulate, maximum<br />
threshold, and minimum threshold. The notify instruction can carry increment as well as clear operations.<br />
Versions are available <strong>for</strong> signed and unsigned data, but also 32- and 64-bit-word lengths.<br />
1. Declared with “volatile” to ensure that no unfair compiler optimizations were used, and that a full read-update-write cycle was<br />
executed.<br />
2. Not implemented on <strong>P3</strong>/<strong>P4</strong>, but available on <strong>P5</strong>.<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
4 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />
Figure 3 shows how the CoreNet plat<strong>for</strong>m cache and DDR controller interface. <strong>Decorated</strong> operations are<br />
implemented in the core as instructions, and the decorations are interpretated by the CoreNet plat<strong>for</strong>m<br />
cache (CPC), which carry out the operations together with the CoreNet DDR queue and DDR controller.<br />
Figure 3. <strong>Decorated</strong> <strong>Operations</strong>, CoreNet Cache, and DDR<br />
A decorated operation carries four parameters: type of access (such as load, store, or notify), data address<br />
to operate on, data to use, and the decorated value that defines the operation.<br />
As an example, the following decorated operation is executed: add 10 (integer) to a variable memory of<br />
type long (64-bit) of variable bar at address A (double-word aligned). The access type is Store Word<br />
<strong>Decorated</strong> (stwdx), the data address is set to A + 4 to <strong>for</strong>ce right justification of the store data within the<br />
accumulator, and the decoration type is set to Accumulate 64-bit. The following C code corresponds to this<br />
operation:<br />
decorated_store_64_acc_64(&bar,10);<br />
Going back to Mr. Foo’s bank account, this type of relative change is the perfect match <strong>for</strong> decorated<br />
operations. The two processes that work on the bank account do not need to use any locks but can simply<br />
execute the following, respective, instructions:<br />
decorated_store_64_acc_64(&account_foo, -20);<br />
decorated_store_64_acc_64(&account_foo, 500);<br />
Note that the order of execution is not important; the change is relative to the current value. This works<br />
well with statistics and data logging, such as keeping track of how much data and packets a specific user<br />
has sent in a network, the distance a car has travelled, progress measurement, and so on. A specific change<br />
such as updating the MAC address in an ARP table, or changing Mr Foo’s account to be owned by<br />
someone else, does not work well. Such abrupt changes require a larger level of synchronization between<br />
the processes to ensure that there are no pending transactions.<br />
Furthermore, the data that is operated on must be marked as cache-inhibited to not be cached by private<br />
L1 and L2 caches. It must also be marked as guarded so that there are no speculative loads causing<br />
undesired effects. The operation is carried out in the L3 plat<strong>for</strong>m cache and data either remains there <strong>for</strong><br />
the time being, or alternatively, is brought in from DDR, updated, and directly put into the DDR write<br />
queue without altering the cache. The store and notify instruction is carried out directly by the core without<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 5<br />
Preliminary—Subject to Change Without Notice
Using Decorations in Software and Per<strong>for</strong>mance Results<br />
the need <strong>for</strong> the final operation to be executed in the CPC, that is, “Fire and Forget.” They are there<strong>for</strong>e be<br />
executed in one cycle or fewer 1 . The per<strong>for</strong>mance is comparably high relative to lock-based approaches<br />
(see Section 3, “Using Decorations in Software and Per<strong>for</strong>mance Results”).<br />
3 Using Decorations in Software and Per<strong>for</strong>mance<br />
Results<br />
<strong>Decorated</strong> operations are typically implemented with macros to simplify usage; alternatively, they could<br />
overload basic add/subtract functions <strong>for</strong> applicable programming language such as C++. In the following<br />
benchmark case, the operation is implemented in bare-metal directly on the <strong>P4</strong>080 without an underlying<br />
operating system. Seven of the eight cores are running bare-metal, whereas the last core is running Linux<br />
to simplify the boot process. However, the operating system configuration and at what level the decorated<br />
operations are implemented are not important, because they are executed on the same privilege level and<br />
have the same characteristics <strong>for</strong> the core as normal load/store operations. Tests are made both in<br />
single-core as well as multicore configurations.<br />
There are three required areas <strong>for</strong> data accessed by a decorated operation. First, a pointer to the data must<br />
be defined, as follows:<br />
volatile int32_t *decorated_counter = NULL;<br />
In the program code, allocate the data and set the value to a default state, in this case zero, as follows:<br />
decorated_counter=(int32_t *) stats_memalign(CACHE_LINE_SIZE,<br />
sizeof(int32_t));<br />
*decorated_counter = 0;<br />
Finally, the code makes use of the data by executing a decorated operation, as follows:<br />
decorated_notify_inc_32(decorated_counter);<br />
The typical use-case <strong>for</strong> decorated operations is to update a data structure that occurs relatively seldomly,<br />
approximately less than every hundred cycle. In this case, an update is executed in a single cycle, which<br />
is the same as it is <strong>for</strong> private data. For a lock-based update, the programmer gets roughly 35 cycles in the<br />
ideal single-core case. These tests were measured by reading the clock cycle timer, running the test,<br />
reading cycle timer again, and then removing a measured overhead <strong>for</strong> reading the timers. The overhead<br />
is at a stable 4 clock cycles:<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
decorated_notify_inc_32(decorated_counter);<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
Because locks use an SoC-wide atomic function, they are affected by other locks. For example, when one<br />
core runs the code (above) and the other cores wait at a different lock, the cycle count increases from<br />
roughly 35 cycles to about 200 cycles. When all cores operate on the same lock, there is additional cycle<br />
count increase. A synthetic use-case that is not typically found in real applications, but has general interest<br />
due to the extensive load it puts on the system, is to run a long loop of updates. This also allows <strong>for</strong><br />
1. The e500mc core is superscalar and can load and retire up to two instructions per cycle under certain conditions.<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
6 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Implementation Details<br />
measuring the penalty due to multicore access to the same data. Below is an example of the code that is<br />
used, this time showing a lock-based access. Each core runs the loop 10,000 times:<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
<strong>for</strong>(i=0; i < 10000; i++){<br />
}<br />
spin_lock(&sync_lock);<br />
lock_counter++;<br />
spin_unlock(&sync_lock);<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
In the case of lock-based accesses with eight cores running in parallel, the average access time increases<br />
to 848 cycles due to the delay at the lock. Note that the standard deviation is very large in this case, nearly<br />
50% of the average cycle count, and the access time is highly undeterministic. For decorated operations,<br />
the CPC is expected to become a bottle neck, because it is not designed to handle this large flow of<br />
consecutive operations. The CPC runs on the SoC clock rather than the core clock and can execute one<br />
decorated operation every second clock cycle. With a 1:2.4 ratio between core/SoC clock and seven cores<br />
executing decorated operations, 33 core clock cycles per iteration is expected. The benchmark confirms<br />
this and the standard deviation is now only 8% of the cycle count.<br />
<strong>Freescale</strong> implemented an application typical test (Network Address and Port Translation—NAPT) to<br />
measure the total impact of running with locks as well as with decorated operations. The core fetched an<br />
incoming UDP/IP packet from the network, did a look-up <strong>for</strong> address translation, changed destination port<br />
and address, updated statistics, and sent out the packet. The interesting part in this case is the statistics that<br />
were updated and the time it consumed.<br />
Without any statistics, but using of the <strong>P4</strong>080 packet processing accelerators, the cycle count per packet<br />
was measured to be 440 cycles with a standard deviation of 18 cycles. A global total packet and total byte<br />
counter were added as well as individual flow-based counters <strong>for</strong> number of packets and number of bytes<br />
transferred. A single lock was used to protect the statistics, and the average packet processing increased to<br />
686 ± 18 cycles with a lock-based approach. In this case, <strong>Freescale</strong> used decorated operations and could<br />
schedule the statistics updated to optimize the per<strong>for</strong>mance, and the total cycle count only increased to 442<br />
± 19 cycles per packet.<br />
The conclusion from the tests is that decorated operations allow <strong>for</strong> a significant per<strong>for</strong>mance increase<br />
compared to lock-based implementations.<br />
4 Implementation Details<br />
<strong>Decorated</strong> storage operations operate only on addresses that have been marked as Caching Inhibited 1 , that<br />
is, non-cacheable. Per<strong>for</strong>ming a decorated storage operation to addresses that are cacheable causes the<br />
operation to degrade to the equivalent non-decorated load or store operation: lbdx into lbx, stwdx into<br />
stwx, and notify into nop.<br />
1. Caching-inhibited: All loads and stores to the page bypass the caches and are per<strong>for</strong>med directly to main memory. A read or<br />
write to a caching-inhibited page affects only the memory element specified by the operation.<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 7<br />
Preliminary—Subject to Change Without Notice
Implementation Details<br />
Addresses to which decorated loads are per<strong>for</strong>med should be marked Guarded 1 , that is, there is no<br />
speculative execution allowed <strong>for</strong> those instructions. If guarded is not set, then speculative execution, <strong>for</strong><br />
example, of a load operation triggers data updated. This is not problematic if the speculation turns out to<br />
be correct. However, if it is not, the case and the load are thrown out from the core pipeline, but the<br />
decoration is still executed in the memory subsystem. This in turn results in an incorrect value of the data.<br />
Variables (that is, accumulators) affected by decorated operations should be naturally aligned to their<br />
variable size (<strong>for</strong> example, word should be 4-byte-aligned). An error here can result in incorrect data<br />
changes, both to the variable operated on and adjacent data.<br />
<strong>Decorated</strong> load, store, and notify operations behave the same as normal load and store operations in all<br />
other aspects, such as Access control, Debug event, Storage attributes, and Alignment and memory access<br />
ordering. In other words, there is no difference between decorated operations and normal operations when<br />
it comes to application usage. Any application can use them without any OS kernel or Hypervisor<br />
interaction or permission.<br />
4.1 Load—Memory Loaded to Core Register with Decoration Result<br />
For decorated load operations, the processor per<strong>for</strong>ms a load operation with the specified decoration to the<br />
given address and places the data provided by the device in the target register. The different operations are<br />
as follows:<br />
8-/16-/32-/64-bit Clear<br />
8-/16-/32-/64-bit Set<br />
8-/16-/32-/64-bit Decrement<br />
8-/16-/32-/64-bit Increment<br />
4.2 Store—Core Register Stored in Memory with Result from<br />
Decoration<br />
For decorated store operations, the processor per<strong>for</strong>ms a store operation with the specified decoration to<br />
the given address and provides the data specified in the source register to the device. The different<br />
operations are as follows:<br />
32-/64-bit accumulate<br />
32-/64-bit increment and 32/64-bit accumulate<br />
64-bit maximum threshold with unsigned double word<br />
32-bit maximum threshold with unsigned word<br />
64-bit minimum threshold with unsigned double word<br />
32-bit minimum threshold with unsigned word<br />
1. Guarded: All loads and stores to this page are per<strong>for</strong>med without speculation. That is, they are known to be required.<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
8 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Sample Application<br />
Note that the increment and accumulate decoration per<strong>for</strong>ms two operations but only takes one decorated<br />
value and also only one effective address. The first operation is an increment by 1, and there<strong>for</strong>e does not<br />
need a decorated value; this is instead used <strong>for</strong> the accumulate operation. The effective address points to a<br />
struct with the first 32-/64-bit value used <strong>for</strong> the increment, and the following 32-/64-bit value is used <strong>for</strong><br />
the accumulation, see below. The usage of this is, <strong>for</strong> example, to update statistics in a dataflow, number<br />
of packages, and number of bytes with just one operation.<br />
struct stat32_pair_t {<br />
};<br />
int32_t inc;<br />
int32_t acc;<br />
4.3 Notify—Decoration Per<strong>for</strong>med on Data in Memory<br />
A notify instruction is an NOP (No Operation) instruction that does not have any effect on general-purpose<br />
registers in the core. The different operations are as follows:<br />
32-/64-bit increment<br />
32-/64-bit clear<br />
5 Sample Application<br />
#include <br />
#include <br />
#include <br />
#include <br />
__PERCPU uint32_t atb_start, atb_stop;<br />
__PERCPU uint32_t atb_oh;<br />
/** Master LWE core does required initialization first */<br />
volatile uint32_t g_ctrl_lwe = INV_LWE_ID;<br />
__PERCPU uint32_t curr_lwe_id = 0; /**< LWE ID <strong>for</strong> each core */<br />
uint32_t sync_lock;<br />
uint32_t init_lock;<br />
struct lwe_barrier sync_barrier;<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 9<br />
Preliminary—Subject to Change Without Notice
Sample Application<br />
volatile int32_t *decorated_counter = NULL;<br />
volatile int32_t lock_counter = 0;<br />
__PERCPU volatile int32_t private_counter = 0;<br />
void singlecore_test(void)<br />
{<br />
uint32_t i;<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
atb_oh = atb_stop - atb_start;<br />
APP_INFO ("Start/Stop overhead is %d cycles.", atb_oh);<br />
atb_oh=4;<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
decorated_notify_inc_32(decorated_counter);<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("1 Decoration took %d cycles.", atb_stop-atb_start - atb_oh);<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
<strong>for</strong>(i=0; i < 10; i++){<br />
decorated_notify_inc_32(decorated_counter);<br />
}<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("10 Decorations took %d cycles.", atb_stop-atb_start - atb_oh);<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
<strong>for</strong>(i=0; i < 1; i++){<br />
spin_lock(&sync_lock);<br />
lock_counter+=i;<br />
spin_unlock(&sync_lock);<br />
}<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("1 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
10 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
}<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
<strong>for</strong>(i=0; i < 10; i++){<br />
spin_lock(&sync_lock);<br />
lock_counter+=i;<br />
spin_unlock(&sync_lock);<br />
}<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("10 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />
void multicore_test(void)<br />
{<br />
}<br />
uint32_t i;<br />
if (unlikely(barrier_sync(&sync_barrier) < 0))<br />
LWE_PANIC("barrier sync failed!");<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
decorated_notify_inc_32(decorated_counter);<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("1 Decoration took %d cycles.", atb_stop-atb_start - atb_oh);<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
<strong>for</strong>(i=0; i < 1; i++){<br />
spin_lock(&sync_lock);<br />
lock_counter+=i;<br />
spin_unlock(&sync_lock);<br />
}<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
APP_INFO ("1 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Sample Application<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 11<br />
Preliminary—Subject to Change Without Notice
Sample Application<br />
int main(int argc, char *argv[])<br />
{<br />
uint32_t i;<br />
curr_lwe_id = get_lwe_id();<br />
spin_lock(&init_lock);<br />
if (g_ctrl_lwe == INV_LWE_ID){<br />
g_ctrl_lwe = curr_lwe_id;<br />
}<br />
else{<br />
APP_INFO("*********************************************");<br />
APP_INFO("<strong>Decorated</strong> <strong>Operations</strong> Benchmark, July 2010");<br />
APP_INFO("Jonas Svennebring, <strong>Freescale</strong> Nordic");<br />
APP_INFO ("Parition %d\n\n", curr_lwe_id);<br />
i = barrier_init(&sync_barrier, get_online_core_mask());<br />
if (unlikely(i != 0)) {<br />
APP_ERROR("Barrier initialization failed");<br />
return 1;<br />
}<br />
decorated_counter = (int32_t *) stats_memalign(CACHE_LINE_SIZE, sizeof(int32_t));<br />
*decorated_counter = 0;<br />
APP_INFO("");<br />
APP_INFO("Singlecore Test:");<br />
singlecore_test();<br />
APP_INFO("Slave Partition, id %d", curr_lwe_id);<br />
atb_start = mfspr(SPR_ATBL); //start timer<br />
atb_stop = mfspr(SPR_ATBL); //stop timer<br />
atb_oh = atb_stop - atb_start;<br />
APP_INFO ("Start/Stop overhead is %d cycles.", atb_oh);<br />
spin_unlock(&init_lock);<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
12 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
}<br />
APP_INFO("");<br />
APP_INFO("Multicore Test:");<br />
multicore_test();<br />
APP_INFO("DONE!");<br />
return 0;<br />
6 <strong>Decorated</strong> Macro Functions<br />
//////////////////////////////////<br />
//// Load Definitions<br />
///////////////////////////<br />
enum LOAD_DECORATION {<br />
};<br />
LOAD_DECORATION_CLEAR = 0,<br />
LOAD_DECORATION_SET = 1,<br />
LOAD_DECORATION_DEC = 2,<br />
LOAD_DECORATION_INC = 3<br />
static inline uint8_t decorated_load_clear_8(volatile void *a){<br />
}<br />
uint8_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />
__ASM("lbdx %0, %1, %2"<br />
: "=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
return r;<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
<strong>Decorated</strong> Macro Functions<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 13<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> Macro Functions<br />
static inline uint8_t decorated_load_set_8(volatile void *a){<br />
uint8_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />
return r;<br />
}<br />
__ASM("lbdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint8_t decorated_load_dec_8(volatile void *a){<br />
uint8_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />
return r;<br />
}<br />
__ASM("lbdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint8_t decorated_load_inc_8(volatile void *a){<br />
uint8_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />
return r;<br />
}<br />
__ASM("lbdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint16_t decorated_load_clear_16(volatile void *a){<br />
uint16_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />
return r;<br />
}<br />
__ASM("lhdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
14 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
static inline uint16_t decorated_load_set_16(volatile void *a){<br />
uint16_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />
return r;<br />
}<br />
__ASM("lhdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint16_t decorated_load_dec_16(volatile void *a){<br />
uint16_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />
return r;<br />
}<br />
__ASM("lhdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint16_t decorated_load_inc_16(volatile void *a){<br />
uint16_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />
return r;<br />
}<br />
__ASM("lhdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint32_t decorated_load_clear_32(volatile void *a){<br />
uint32_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />
return r;<br />
__ASM("lwdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
<strong>Decorated</strong> Macro Functions<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 15<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> Macro Functions<br />
}<br />
static inline uint32_t decorated_load_set_32(volatile void *a){<br />
uint32_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />
return r;<br />
}<br />
__ASM("lwdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint32_t decorated_load_dec_32(volatile void *a){<br />
uint32_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />
return r;<br />
}<br />
__ASM("lwdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint32_t decorated_load_inc_32(volatile void *a){<br />
uint32_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />
return r;<br />
}<br />
__ASM("lwdx %0, %1, %2":"=r"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint64_t decorated_load_clear_64(volatile void *a){<br />
uint64_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />
__ASM("lfddx %0, %1, %2":"=f"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
16 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
eturn r;<br />
}<br />
static inline uint64_t decorated_load_set_64(volatile void *a){<br />
}<br />
uint64_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />
return r;<br />
__ASM("lfddx %0, %1, %2":"=f"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint64_t decorated_load_dec_64(volatile void *a){<br />
uint64_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />
return r;<br />
}<br />
__ASM("lfddx %0, %1, %2":"=f"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
static inline uint64_t decorated_load_inc_64(volatile void *a){<br />
uint64_t r;<br />
enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />
return r;<br />
}<br />
__ASM("lfddx %0, %1, %2":"=f"(r)<br />
: "r"(d), "r"(a)<br />
: "memory");<br />
//////////////////////////////////<br />
//// Store Definitions<br />
///////////////////////////<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
<strong>Decorated</strong> Macro Functions<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 17<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> Macro Functions<br />
enum STORE_DECORATION {<br />
};<br />
STORE_DECORATION_ACC_64 = 0,<br />
STORE_DECORATION_ACC_32 = 1,<br />
STORE_DECORATION_INC_ACC_64 = 2,<br />
STORE_DECORATION_INC_ACC_32 = 3<br />
struct stat32_pair_t {<br />
};<br />
int32_t inc;<br />
int32_t acc;<br />
struct stat64_pair_t {<br />
};<br />
int64_t inc;<br />
int64_t acc;<br />
static inline void decorated_store_32_acc_32(volatile void *a, register int32_t v){<br />
}<br />
volatile void *address = a;<br />
enum STORE_DECORATION d = STORE_DECORATION_ACC_32;<br />
__ASM("stwdx %0, %1, %2":<br />
:"r"(v), "r"(d), "r"(address)<br />
:"memory");<br />
static inline void decorated_store_32_inc_acc_32(volatile void *a, register int32_t v){<br />
}<br />
volatile void *address = (void *) ((uintptr_t) a + 4);<br />
enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_32;<br />
__ASM("stwdx %0, %1, %2":<br />
:"r"(v), "r"(d), "r"(address)<br />
:"memory");<br />
static inline void decorated_store_64_acc_32(volatile void *a, register int32_t v){<br />
volatile void *address = (void *) ((uintptr_t) a + 4);<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
18 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
}<br />
enum STORE_DECORATION d = STORE_DECORATION_ACC_64;<br />
__ASM("stwdx %0, %1, %2":<br />
:"r"(v), "r"(d), "r"(address)<br />
:"memory");<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
<strong>Decorated</strong> Macro Functions<br />
static inline void decorated_store_64_inc_acc_32(volatile void *a, register int32_t v){<br />
}<br />
volatile void *address = (void *) ((uintptr_t) a + 12);<br />
enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_64;<br />
__ASM("stwdx %0, %1, %2":<br />
:"r"(v), "r"(d), "r"(address)<br />
:"memory");<br />
static inline void decorated_store_64_acc_64(volatile void *a, register int64_t v){<br />
}<br />
volatile void *address = a;<br />
enum STORE_DECORATION d = STORE_DECORATION_ACC_64;<br />
__ASM("stfddx %0, %1, %2":<br />
:"f"(v), "r"(d), "r"(address)<br />
:"memory");<br />
static inline void decorated_store_64_inc_acc_64(volatile void *a, register int64_t v){<br />
}<br />
volatile void *address = (void *) ((uintptr_t) a + 8);<br />
enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_64;<br />
__ASM("stfddx %0, %1, %2":<br />
:"f"(v), "r"(d), "r"(address)<br />
:"memory");<br />
//////////////////////////////////<br />
//// Notify Definitions<br />
///////////////////////////<br />
enum NOTIFY_DECORATION {<br />
NOTIFY_DECORATION_INC_64 = 0,<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 19<br />
Preliminary—Subject to Change Without Notice
<strong>Decorated</strong> Macro Functions<br />
};<br />
NOTIFY_DECORATION_INC_32 = 1,<br />
NOTIFY_DECORATION_CLEAR_64 = 2,<br />
NOTIFY_DECORATION_CLEAR_32 = 3<br />
static inline void decorated_notify_inc_32(volatile void *a){<br />
}<br />
register enum STORE_DECORATION d = NOTIFY_DECORATION_INC_32;<br />
__ASM("dsn %0, %1":<br />
:"r"(d), "r"(a)<br />
:"memory");<br />
static inline void decorated_notify_clear_32(volatile void *a){<br />
}<br />
register enum STORE_DECORATION d = NOTIFY_DECORATION_CLEAR_32;<br />
__ASM("dsn %0, %1":<br />
:"r"(d), "r"(a)<br />
:"memory");<br />
static inline void decorated_notify_inc_64(volatile void *a){<br />
}<br />
register enum STORE_DECORATION d = NOTIFY_DECORATION_INC_64;<br />
__ASM("dsn %0, %1":<br />
:"r"(d), "r"(a)<br />
:"memory");<br />
static inline void decorated_notify_clear_64(volatile void *a){<br />
}<br />
register enum STORE_DECORATION d = NOTIFY_DECORATION_CLEAR_64;<br />
__ASM("dsn %0, %1":<br />
:"r"(d), "r"(a)<br />
:"memory");<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
20 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />
Preliminary—Subject to Change Without Notice
7 Summary<br />
<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />
Summary<br />
Working on shared data in multicore devices poses a problem, because simultaneous access to the data<br />
without any protection gives rise to race-condition and undeterministic behavior. The traditional approach<br />
to avoid race-conditions between the cores has been to introduce locks around the shared data. However,<br />
locks decrease the level of parallelism (and there<strong>for</strong>e the scalability of the software) as well as raise new<br />
issues with both reduced per<strong>for</strong>mance and robustness as side effects.<br />
The solution described in this application note makes use of new instructions that allow a central part of<br />
the device to update the data. This can then be done in an atomic fashion and without core-specific<br />
influence. Per<strong>for</strong>mance can be as good as private data accesses, and per<strong>for</strong>mance <strong>for</strong> both synthetic<br />
worst-case tests as well application realistic tests are well above that of lock-based solutions.<br />
8 References<br />
Following is a list of helpful references used in this application note:<br />
1. Embedded Multicore: An Introduction by Jonas Svennebring, John Logan, Jakob Engblom, Patrik<br />
Strömblad. <strong>Freescale</strong> Semiconductor, Inc. 2009.<br />
2. Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities by<br />
Amdahl, Gene. AFIPS Conference Proceedings (30) 483–485 (1967).<br />
3. Experience with Processes and Monitors in Mesa by Butler W. Lampson and David D. Redell.<br />
CACM 23(2):105-117 (February 1980)<br />
4. The Deadlock problem: a classifying bibliography by Zöbel, Dieter. ACM SIGOPS Operating<br />
Systems Review 17 (4): 6–15. (October 1983)<br />
5. Eliminating receive livelock in an interrupt-driven kernel by Mogul, Jeffrey C.; K. K.<br />
Ramakrishnan. ACM TOCS 15 (3): 217-252 (August 1997)<br />
6. <strong>Freescale</strong> Book E Implementation Standards <strong>for</strong> Storage, version 0.92, 3/7/2008<br />
7. Transactional Memory: Architectural Support <strong>for</strong> Lock-Free Data Structures by Maurice Herlihy,<br />
J. Eliot B. Moss. ISCA Proceedings, 289–300 (1993).<br />
9 Revision History<br />
Table 1 provides a revision history <strong>for</strong> this application note.<br />
Rev.<br />
Number<br />
Table 1. Document Revision History<br />
Date Substantive Change(s)<br />
A 07/2010 Initial NDA release<br />
<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 21<br />
Preliminary—Subject to Change Without Notice
How to Reach Us:<br />
Home Page:<br />
www.freescale.com<br />
Web Support:<br />
http://www.freescale.com/support<br />
USA/Europe or Locations Not Listed:<br />
<strong>Freescale</strong> Semiconductor, Inc.<br />
Technical In<strong>for</strong>mation Center, EL516<br />
2100 East Elliot Road<br />
Tempe, Arizona 85284<br />
1-800-521-6274 or<br />
+1-480-768-2130<br />
www.freescale.com/support<br />
Europe, Middle East, and Africa:<br />
<strong>Freescale</strong> Halbleiter Deutschland GmbH<br />
Technical In<strong>for</strong>mation Center<br />
Schatzbogen 7<br />
81829 Muenchen, Germany<br />
+44 1296 380 456 (English)<br />
+46 8 52200080 (English)<br />
+49 89 92103 559 (German)<br />
+33 1 69 35 48 48 (French)<br />
www.freescale.com/support<br />
Japan:<br />
<strong>Freescale</strong> Semiconductor Japan Ltd.<br />
Headquarters<br />
ARCO Tower 15F<br />
1-8-1, Shimo-Meguro, Meguro-ku<br />
Tokyo 153-0064<br />
Japan<br />
0120 191014 or<br />
+81 3 5437 9125<br />
support.japan@freescale.com<br />
Asia/Pacific:<br />
<strong>Freescale</strong> Semiconductor China Ltd.<br />
Exchange Building 23F<br />
No. 118 Jianguo Road<br />
Chaoyang District<br />
Beijing 100022<br />
China<br />
+86 10 5879 8000<br />
support.asia@freescale.com<br />
For Literature Requests Only:<br />
<strong>Freescale</strong> Semiconductor<br />
Literature Distribution Center<br />
1-800 441-2447 or<br />
+1-303-675-2140<br />
Fax: +1-303-675-2150<br />
LDCFor<strong>Freescale</strong>Semiconductor<br />
@hibbertgroup.com<br />
Document Number: AN4181<br />
Rev. A<br />
07/2010<br />
<strong>Freescale</strong> Confidential Proprietary<br />
Preliminary—Subject to Change Without Notice<br />
In<strong>for</strong>mation in this document is provided solely to enable system and software<br />
implementers to use <strong>Freescale</strong> Semiconductor products. There are no express or<br />
implied copyright licenses granted hereunder to design or fabricate any integrated<br />
circuits or integrated circuits based on the in<strong>for</strong>mation in this document.<br />
<strong>Freescale</strong> Semiconductor reserves the right to make changes without further notice to<br />
any products herein. <strong>Freescale</strong> Semiconductor makes no warranty, representation or<br />
guarantee regarding the suitability of its products <strong>for</strong> any particular purpose, nor does<br />
<strong>Freescale</strong> Semiconductor assume any liability arising out of the application or use of<br />
any product or circuit, and specifically disclaims any and all liability, including without<br />
limitation consequential or incidental damages. “Typical” parameters which may be<br />
provided in <strong>Freescale</strong> Semiconductor data sheets and/or specifications can and do<br />
vary in different applications and actual per<strong>for</strong>mance may vary over time. All operating<br />
parameters, including “Typicals” must be validated <strong>for</strong> each customer application by<br />
customer’s technical experts. <strong>Freescale</strong> Semiconductor does not convey any license<br />
under its patent rights nor the rights of others. <strong>Freescale</strong> Semiconductor products are<br />
not designed, intended, or authorized <strong>for</strong> use as components in systems intended <strong>for</strong><br />
surgical implant into the body, or other applications intended to support or sustain life,<br />
or <strong>for</strong> any other application in which the failure of the <strong>Freescale</strong> Semiconductor product<br />
could create a situation where personal injury or death may occur. Should Buyer<br />
purchase or use <strong>Freescale</strong> Semiconductor products <strong>for</strong> any such unintended or<br />
unauthorized application, Buyer shall indemnify and hold <strong>Freescale</strong> Semiconductor<br />
and its officers, employees, subsidiaries, affiliates, and distributors harmless against all<br />
claims, costs, damages, and expenses, and reasonable attorney fees arising out of,<br />
directly or indirectly, any claim of personal injury or death associated with such<br />
unintended or unauthorized use, even if such claim alleges that <strong>Freescale</strong><br />
Semiconductor was negligent regarding the design or manufacture of the part.<br />
<strong>Freescale</strong> and the <strong>Freescale</strong> logo are trademarks of <strong>Freescale</strong><br />
Semiconductor, Inc. Reg. U.S. Pat. & Tm. Off. CoreNet and <strong>QorIQ</strong> are<br />
trademarks of <strong>Freescale</strong> Semiconductor, Inc. All other product or service<br />
names are the property of their respective owners. The Power Architecture<br />
and Power.org word marks and the Power and Power.org logos and related<br />
marks are trademarks and service marks licensed by Power.org.<br />
© 2010 <strong>Freescale</strong> Semiconductor, Inc.