03.08.2013 Views

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

Decorated Operations for QorIQ P3/P4/P5 Processors - Freescale ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Freescale</strong> Semiconductor<br />

Application Note<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong><br />

<strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong><br />

by Networking and Multimedia Group<br />

<strong>Freescale</strong> Semiconductor, Inc.<br />

Austin, TX<br />

This application note presents the concept of “statistics<br />

acceleration” implemented with the usage of decorated<br />

operations.<br />

The important tasks of statistics gathering and logging of<br />

ongoing activity within an embedded system often consumes<br />

a substantial amount of cycles. This can be seen in<br />

applications such as computer vision and vehicle control, as<br />

well as network and telecom infrastructure. In the latter case,<br />

individual flows of data must be tracked, and this<br />

in<strong>for</strong>mation is then used to find errors and tune the network<br />

to optimal per<strong>for</strong>mance. Although this is an important<br />

functionality, it is essential that statistics handling take as<br />

little time as possible.<br />

With the current multicore trend, time-efficient statistics<br />

handling is becoming more difficult. This is due to data<br />

structures with statistics or other key parameters being<br />

shared between the cores, and locks must be put around them<br />

to prevent race-conditions. This can cause well-known<br />

problems, such as dead-locks, live-locks, and priority<br />

inversion, and an even higher layer of complexity must be<br />

introduced.<br />

© 2010 <strong>Freescale</strong> Semiconductor, Inc. All rights reserved.<br />

<strong>Freescale</strong> Confidential Proprietary<br />

Preliminary—Subject to Change Without Notice<br />

Document Number: AN4181<br />

Rev. A, 07/2010<br />

Contents<br />

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

2. Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong> . . . 4<br />

3. Using Decorations in Software<br />

and Per<strong>for</strong>mance Results . . . . . . . . . . . . . . . . . . . . . . 6<br />

4. Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 7<br />

5. Sample Application . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

6. <strong>Decorated</strong> Macro Functions . . . . . . . . . . . . . . . . . . . 13<br />

7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

9. Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


Introduction<br />

Statistics acceleration with the use of decorated operations solves the initial problem without the need <strong>for</strong><br />

locks. This drastically increases per<strong>for</strong>mance and lowers the software complexity, because all protection<br />

<strong>for</strong> issues related to locks can be removed.<br />

1 Introduction<br />

Currently, the industry is rapidly migrating to multicore solutions, largely due to the fact that easy<br />

single-core per<strong>for</strong>mance improvements using stronger or faster cores is coming to an end. Further<br />

improvements to code execution (instructions per cycle) introduce drastically more complex logic.<br />

Increasing the frequency is difficult, because power consumption increases to the power of two relative<br />

the frequency1 . Furthermore, higher frequency gives little additional per<strong>for</strong>mance due to the core/memory<br />

speed difference [1].<br />

Multicore solutions give a theoretically higher per<strong>for</strong>mance with low aggregate power consumption, but<br />

it is crucial that the hardware and software is designed to allow <strong>for</strong> efficient scaling. The most important<br />

hardware aspects are buses and memory interfaces; in this area, the concept of switch fabrics is replacing<br />

traditional buses. For example, <strong>Freescale</strong>’s <strong>P4</strong>080 communication processor, equipped with eight e500mc<br />

Power Architecture® cores, solves this problem by utilizing the CoreNet coherency fabric with nearly 1<br />

Tbps of internal memory bandwidth and dual DDR3 interfaces. The other aspect, software design, is<br />

difficult to solve on a general basis to allow efficient scaling. Amdahl’s law [2] describes the application<br />

speed-up relative to the number of cores and how well parallelized the software is. As shown in Figure 1,<br />

software that is largely sequential can never make efficient use of highly parallel architectures. There<strong>for</strong>e,<br />

it is critical to provide means to remove sequential sections.<br />

Figure 1 shows Amdahl’s law of scaling over multiple cores <strong>for</strong> different degrees of parallelized code. S<br />

marks the portion of sequential code.<br />

Figure 1. Amdahl’s Law<br />

1. Relation is P = CV 2 F, but higher frequency requires higher voltage and leakier processes.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

2 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Introduction<br />

Single-core applications typically work by reading data from a data structure, computing a result, and then<br />

writing that back to the data structure. When this application is parallelized, the same computation is done,<br />

but it is now important to protect the data structure so that it is not incorrectly updated due to the<br />

concurrency.<br />

Take bank software as an example. Mr Foo’s account currently holds $1000 and is now accessed by two<br />

processes at nearly the same time. One inserts $500 and the other deducts $20. Both processes read the<br />

current statement, add or remove money, and finally write the result back the account statement. If the read<br />

operations took place be<strong>for</strong>e either write, then the final result is the same as the last write operation, and<br />

the other operation will not come through. Mr. Foo will be left with either the full $1500 or only $980 in<br />

his account.<br />

Figure 2 shows Mr. Foo’s bank account. The final value of the account is non-deterministic.<br />

Figure 2. Mr. Foo’s Bank Account<br />

This type of issue is called a race condition, and its traditional solution is to introduce a lock on the data<br />

structure. However, locks are sequential by nature and lower the degree of parallelism. A lock is typically<br />

implemented as follows:<br />

Test/Set to get lock <strong>for</strong> structure<br />

If (lock was already used by other) then try above again<br />

Release lock<br />

Read and update structure<br />

Locks also introduce additional software complexity, because if priority inversion [3] and dead-lock<br />

situations [4] occur, they must be handled. This in turn can lead to live-locks [5]. To conclude, the<br />

traditional method of handling synchronization by using locks is not robust because it does not allow <strong>for</strong><br />

efficient scaling. It is also very costly in terms of cycles. In a benchmark running bare-board on the <strong>P4</strong>080<br />

and utilizing <strong>Freescale</strong>’s light-weight executive (LWE) library, a shared variable protected by locks took<br />

nearly 25 times as long to update compared to a private variable 1 due to the lock overhead alone. In the<br />

1. Declared with “volatile” to ensure that no unfair compiler optimizations were used, and that a full read-update-write cycle was<br />

executed<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 3<br />

Preliminary—Subject to Change Without Notice


Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />

case where seven cores tried to update the same shared variable multiple times and hence also had to wait<br />

<strong>for</strong> the lock to be freed, the update took on average 95 times longer than the private variable. See Section 3,<br />

“Using Decorations in Software and Per<strong>for</strong>mance Results,” <strong>for</strong> urther details on the benchmark.<br />

<strong>Freescale</strong>’s implementation of decorated operations goes back to the initial, most basic problem, namely,<br />

how to share data in a multicore environment. <strong>Freescale</strong>’s decorated operations solve this problem by<br />

means that do not introduce sequential code and have a low software overhead and complexity. This is<br />

done by using other parts of the SoC besides the cores to per<strong>for</strong>m relative operations on data, such as<br />

“increase x by 10.” By moving the operation out from the core into a central location, it is possible to<br />

guarantee atomic operations and simpler inter-core order of execution.<br />

We will discuss how <strong>Decorated</strong> <strong>Operations</strong> are implemented in <strong>Freescale</strong>’s high-end <strong>QorIQ</strong> processor<br />

family with the <strong>P3</strong>, <strong>P4</strong> and <strong>P5</strong> devices and how these operations can be made use of by software in order<br />

to reach per<strong>for</strong>mance numbers that are equal to operations on private variables. Use-cases and examples<br />

will mainly be taken from the <strong>P4</strong>080 but is generally applicable <strong>for</strong> all the <strong>P3</strong>, <strong>P4</strong> and <strong>P5</strong> devices.<br />

2 Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />

<strong>Decorated</strong> operations, or decorated storage, as it is also commonly named, is a set of core instructions<br />

added to the Power Architecture instruction set [6]. The instructions are “decorated” with a computation<br />

and attribute to the common load/store instruction. For example, the stbx (Store Byte Indexed) now also<br />

has a decorated version: stbdx (Store Byte <strong>Decorated</strong> Indexed). This is also applied to half-word, word,<br />

double-word, and double-float versions of the store as well as load instructions (that is, stbdx, sthdx,<br />

stwdx, stddx 1 , and stfddx, and lbdx, lhdx, lwdx, lddx 2 , and lfddx). An additional dsn (Notify) instruction<br />

has also been added that does not have any corresponding load/store version, but is interpreted as a nop<br />

(No Operation by the core) and carries a decoration.<br />

This decoration does not have any direct meaning to the core itself, but depending on the SoC<br />

implementation, it is interpreted by other parts of the device. In the case of <strong>Freescale</strong>’s <strong>QorIQ</strong> processor<br />

family, these decorations are interpreted by the CPC, which carries out the operations together with the<br />

CoreNet DDR queue and DDR controller (see Figure 3). These act similarly to transactional memory [7]<br />

to per<strong>for</strong>m operations on a global scale outside the cores. Unlike transactional memory, there is no need to<br />

handle rollbacks, because CoreNet buffer transactions are required and ensure the correct order of<br />

execution. The decorations <strong>for</strong> load instructions include clear, set, decrement, and increment of data. Store<br />

instructions include accumulate (could be negative), combined increment and accumulate, maximum<br />

threshold, and minimum threshold. The notify instruction can carry increment as well as clear operations.<br />

Versions are available <strong>for</strong> signed and unsigned data, but also 32- and 64-bit-word lengths.<br />

1. Declared with “volatile” to ensure that no unfair compiler optimizations were used, and that a full read-update-write cycle was<br />

executed.<br />

2. Not implemented on <strong>P3</strong>/<strong>P4</strong>, but available on <strong>P5</strong>.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

4 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Silicon Implementation of <strong>Decorated</strong> <strong>Operations</strong><br />

Figure 3 shows how the CoreNet plat<strong>for</strong>m cache and DDR controller interface. <strong>Decorated</strong> operations are<br />

implemented in the core as instructions, and the decorations are interpretated by the CoreNet plat<strong>for</strong>m<br />

cache (CPC), which carry out the operations together with the CoreNet DDR queue and DDR controller.<br />

Figure 3. <strong>Decorated</strong> <strong>Operations</strong>, CoreNet Cache, and DDR<br />

A decorated operation carries four parameters: type of access (such as load, store, or notify), data address<br />

to operate on, data to use, and the decorated value that defines the operation.<br />

As an example, the following decorated operation is executed: add 10 (integer) to a variable memory of<br />

type long (64-bit) of variable bar at address A (double-word aligned). The access type is Store Word<br />

<strong>Decorated</strong> (stwdx), the data address is set to A + 4 to <strong>for</strong>ce right justification of the store data within the<br />

accumulator, and the decoration type is set to Accumulate 64-bit. The following C code corresponds to this<br />

operation:<br />

decorated_store_64_acc_64(&bar,10);<br />

Going back to Mr. Foo’s bank account, this type of relative change is the perfect match <strong>for</strong> decorated<br />

operations. The two processes that work on the bank account do not need to use any locks but can simply<br />

execute the following, respective, instructions:<br />

decorated_store_64_acc_64(&account_foo, -20);<br />

decorated_store_64_acc_64(&account_foo, 500);<br />

Note that the order of execution is not important; the change is relative to the current value. This works<br />

well with statistics and data logging, such as keeping track of how much data and packets a specific user<br />

has sent in a network, the distance a car has travelled, progress measurement, and so on. A specific change<br />

such as updating the MAC address in an ARP table, or changing Mr Foo’s account to be owned by<br />

someone else, does not work well. Such abrupt changes require a larger level of synchronization between<br />

the processes to ensure that there are no pending transactions.<br />

Furthermore, the data that is operated on must be marked as cache-inhibited to not be cached by private<br />

L1 and L2 caches. It must also be marked as guarded so that there are no speculative loads causing<br />

undesired effects. The operation is carried out in the L3 plat<strong>for</strong>m cache and data either remains there <strong>for</strong><br />

the time being, or alternatively, is brought in from DDR, updated, and directly put into the DDR write<br />

queue without altering the cache. The store and notify instruction is carried out directly by the core without<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 5<br />

Preliminary—Subject to Change Without Notice


Using Decorations in Software and Per<strong>for</strong>mance Results<br />

the need <strong>for</strong> the final operation to be executed in the CPC, that is, “Fire and Forget.” They are there<strong>for</strong>e be<br />

executed in one cycle or fewer 1 . The per<strong>for</strong>mance is comparably high relative to lock-based approaches<br />

(see Section 3, “Using Decorations in Software and Per<strong>for</strong>mance Results”).<br />

3 Using Decorations in Software and Per<strong>for</strong>mance<br />

Results<br />

<strong>Decorated</strong> operations are typically implemented with macros to simplify usage; alternatively, they could<br />

overload basic add/subtract functions <strong>for</strong> applicable programming language such as C++. In the following<br />

benchmark case, the operation is implemented in bare-metal directly on the <strong>P4</strong>080 without an underlying<br />

operating system. Seven of the eight cores are running bare-metal, whereas the last core is running Linux<br />

to simplify the boot process. However, the operating system configuration and at what level the decorated<br />

operations are implemented are not important, because they are executed on the same privilege level and<br />

have the same characteristics <strong>for</strong> the core as normal load/store operations. Tests are made both in<br />

single-core as well as multicore configurations.<br />

There are three required areas <strong>for</strong> data accessed by a decorated operation. First, a pointer to the data must<br />

be defined, as follows:<br />

volatile int32_t *decorated_counter = NULL;<br />

In the program code, allocate the data and set the value to a default state, in this case zero, as follows:<br />

decorated_counter=(int32_t *) stats_memalign(CACHE_LINE_SIZE,<br />

sizeof(int32_t));<br />

*decorated_counter = 0;<br />

Finally, the code makes use of the data by executing a decorated operation, as follows:<br />

decorated_notify_inc_32(decorated_counter);<br />

The typical use-case <strong>for</strong> decorated operations is to update a data structure that occurs relatively seldomly,<br />

approximately less than every hundred cycle. In this case, an update is executed in a single cycle, which<br />

is the same as it is <strong>for</strong> private data. For a lock-based update, the programmer gets roughly 35 cycles in the<br />

ideal single-core case. These tests were measured by reading the clock cycle timer, running the test,<br />

reading cycle timer again, and then removing a measured overhead <strong>for</strong> reading the timers. The overhead<br />

is at a stable 4 clock cycles:<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

decorated_notify_inc_32(decorated_counter);<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

Because locks use an SoC-wide atomic function, they are affected by other locks. For example, when one<br />

core runs the code (above) and the other cores wait at a different lock, the cycle count increases from<br />

roughly 35 cycles to about 200 cycles. When all cores operate on the same lock, there is additional cycle<br />

count increase. A synthetic use-case that is not typically found in real applications, but has general interest<br />

due to the extensive load it puts on the system, is to run a long loop of updates. This also allows <strong>for</strong><br />

1. The e500mc core is superscalar and can load and retire up to two instructions per cycle under certain conditions.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

6 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Implementation Details<br />

measuring the penalty due to multicore access to the same data. Below is an example of the code that is<br />

used, this time showing a lock-based access. Each core runs the loop 10,000 times:<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

<strong>for</strong>(i=0; i < 10000; i++){<br />

}<br />

spin_lock(&sync_lock);<br />

lock_counter++;<br />

spin_unlock(&sync_lock);<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

In the case of lock-based accesses with eight cores running in parallel, the average access time increases<br />

to 848 cycles due to the delay at the lock. Note that the standard deviation is very large in this case, nearly<br />

50% of the average cycle count, and the access time is highly undeterministic. For decorated operations,<br />

the CPC is expected to become a bottle neck, because it is not designed to handle this large flow of<br />

consecutive operations. The CPC runs on the SoC clock rather than the core clock and can execute one<br />

decorated operation every second clock cycle. With a 1:2.4 ratio between core/SoC clock and seven cores<br />

executing decorated operations, 33 core clock cycles per iteration is expected. The benchmark confirms<br />

this and the standard deviation is now only 8% of the cycle count.<br />

<strong>Freescale</strong> implemented an application typical test (Network Address and Port Translation—NAPT) to<br />

measure the total impact of running with locks as well as with decorated operations. The core fetched an<br />

incoming UDP/IP packet from the network, did a look-up <strong>for</strong> address translation, changed destination port<br />

and address, updated statistics, and sent out the packet. The interesting part in this case is the statistics that<br />

were updated and the time it consumed.<br />

Without any statistics, but using of the <strong>P4</strong>080 packet processing accelerators, the cycle count per packet<br />

was measured to be 440 cycles with a standard deviation of 18 cycles. A global total packet and total byte<br />

counter were added as well as individual flow-based counters <strong>for</strong> number of packets and number of bytes<br />

transferred. A single lock was used to protect the statistics, and the average packet processing increased to<br />

686 ± 18 cycles with a lock-based approach. In this case, <strong>Freescale</strong> used decorated operations and could<br />

schedule the statistics updated to optimize the per<strong>for</strong>mance, and the total cycle count only increased to 442<br />

± 19 cycles per packet.<br />

The conclusion from the tests is that decorated operations allow <strong>for</strong> a significant per<strong>for</strong>mance increase<br />

compared to lock-based implementations.<br />

4 Implementation Details<br />

<strong>Decorated</strong> storage operations operate only on addresses that have been marked as Caching Inhibited 1 , that<br />

is, non-cacheable. Per<strong>for</strong>ming a decorated storage operation to addresses that are cacheable causes the<br />

operation to degrade to the equivalent non-decorated load or store operation: lbdx into lbx, stwdx into<br />

stwx, and notify into nop.<br />

1. Caching-inhibited: All loads and stores to the page bypass the caches and are per<strong>for</strong>med directly to main memory. A read or<br />

write to a caching-inhibited page affects only the memory element specified by the operation.<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 7<br />

Preliminary—Subject to Change Without Notice


Implementation Details<br />

Addresses to which decorated loads are per<strong>for</strong>med should be marked Guarded 1 , that is, there is no<br />

speculative execution allowed <strong>for</strong> those instructions. If guarded is not set, then speculative execution, <strong>for</strong><br />

example, of a load operation triggers data updated. This is not problematic if the speculation turns out to<br />

be correct. However, if it is not, the case and the load are thrown out from the core pipeline, but the<br />

decoration is still executed in the memory subsystem. This in turn results in an incorrect value of the data.<br />

Variables (that is, accumulators) affected by decorated operations should be naturally aligned to their<br />

variable size (<strong>for</strong> example, word should be 4-byte-aligned). An error here can result in incorrect data<br />

changes, both to the variable operated on and adjacent data.<br />

<strong>Decorated</strong> load, store, and notify operations behave the same as normal load and store operations in all<br />

other aspects, such as Access control, Debug event, Storage attributes, and Alignment and memory access<br />

ordering. In other words, there is no difference between decorated operations and normal operations when<br />

it comes to application usage. Any application can use them without any OS kernel or Hypervisor<br />

interaction or permission.<br />

4.1 Load—Memory Loaded to Core Register with Decoration Result<br />

For decorated load operations, the processor per<strong>for</strong>ms a load operation with the specified decoration to the<br />

given address and places the data provided by the device in the target register. The different operations are<br />

as follows:<br />

8-/16-/32-/64-bit Clear<br />

8-/16-/32-/64-bit Set<br />

8-/16-/32-/64-bit Decrement<br />

8-/16-/32-/64-bit Increment<br />

4.2 Store—Core Register Stored in Memory with Result from<br />

Decoration<br />

For decorated store operations, the processor per<strong>for</strong>ms a store operation with the specified decoration to<br />

the given address and provides the data specified in the source register to the device. The different<br />

operations are as follows:<br />

32-/64-bit accumulate<br />

32-/64-bit increment and 32/64-bit accumulate<br />

64-bit maximum threshold with unsigned double word<br />

32-bit maximum threshold with unsigned word<br />

64-bit minimum threshold with unsigned double word<br />

32-bit minimum threshold with unsigned word<br />

1. Guarded: All loads and stores to this page are per<strong>for</strong>med without speculation. That is, they are known to be required.<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

8 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Sample Application<br />

Note that the increment and accumulate decoration per<strong>for</strong>ms two operations but only takes one decorated<br />

value and also only one effective address. The first operation is an increment by 1, and there<strong>for</strong>e does not<br />

need a decorated value; this is instead used <strong>for</strong> the accumulate operation. The effective address points to a<br />

struct with the first 32-/64-bit value used <strong>for</strong> the increment, and the following 32-/64-bit value is used <strong>for</strong><br />

the accumulation, see below. The usage of this is, <strong>for</strong> example, to update statistics in a dataflow, number<br />

of packages, and number of bytes with just one operation.<br />

struct stat32_pair_t {<br />

};<br />

int32_t inc;<br />

int32_t acc;<br />

4.3 Notify—Decoration Per<strong>for</strong>med on Data in Memory<br />

A notify instruction is an NOP (No Operation) instruction that does not have any effect on general-purpose<br />

registers in the core. The different operations are as follows:<br />

32-/64-bit increment<br />

32-/64-bit clear<br />

5 Sample Application<br />

#include <br />

#include <br />

#include <br />

#include <br />

__PERCPU uint32_t atb_start, atb_stop;<br />

__PERCPU uint32_t atb_oh;<br />

/** Master LWE core does required initialization first */<br />

volatile uint32_t g_ctrl_lwe = INV_LWE_ID;<br />

__PERCPU uint32_t curr_lwe_id = 0; /**< LWE ID <strong>for</strong> each core */<br />

uint32_t sync_lock;<br />

uint32_t init_lock;<br />

struct lwe_barrier sync_barrier;<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 9<br />

Preliminary—Subject to Change Without Notice


Sample Application<br />

volatile int32_t *decorated_counter = NULL;<br />

volatile int32_t lock_counter = 0;<br />

__PERCPU volatile int32_t private_counter = 0;<br />

void singlecore_test(void)<br />

{<br />

uint32_t i;<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

atb_oh = atb_stop - atb_start;<br />

APP_INFO ("Start/Stop overhead is %d cycles.", atb_oh);<br />

atb_oh=4;<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

decorated_notify_inc_32(decorated_counter);<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("1 Decoration took %d cycles.", atb_stop-atb_start - atb_oh);<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

<strong>for</strong>(i=0; i < 10; i++){<br />

decorated_notify_inc_32(decorated_counter);<br />

}<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("10 Decorations took %d cycles.", atb_stop-atb_start - atb_oh);<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

<strong>for</strong>(i=0; i < 1; i++){<br />

spin_lock(&sync_lock);<br />

lock_counter+=i;<br />

spin_unlock(&sync_lock);<br />

}<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("1 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

10 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


}<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

<strong>for</strong>(i=0; i < 10; i++){<br />

spin_lock(&sync_lock);<br />

lock_counter+=i;<br />

spin_unlock(&sync_lock);<br />

}<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("10 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />

void multicore_test(void)<br />

{<br />

}<br />

uint32_t i;<br />

if (unlikely(barrier_sync(&sync_barrier) < 0))<br />

LWE_PANIC("barrier sync failed!");<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

decorated_notify_inc_32(decorated_counter);<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("1 Decoration took %d cycles.", atb_stop-atb_start - atb_oh);<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

<strong>for</strong>(i=0; i < 1; i++){<br />

spin_lock(&sync_lock);<br />

lock_counter+=i;<br />

spin_unlock(&sync_lock);<br />

}<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

APP_INFO ("1 lock counter took %d cycles.", atb_stop-atb_start - atb_oh);<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Sample Application<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 11<br />

Preliminary—Subject to Change Without Notice


Sample Application<br />

int main(int argc, char *argv[])<br />

{<br />

uint32_t i;<br />

curr_lwe_id = get_lwe_id();<br />

spin_lock(&init_lock);<br />

if (g_ctrl_lwe == INV_LWE_ID){<br />

g_ctrl_lwe = curr_lwe_id;<br />

}<br />

else{<br />

APP_INFO("*********************************************");<br />

APP_INFO("<strong>Decorated</strong> <strong>Operations</strong> Benchmark, July 2010");<br />

APP_INFO("Jonas Svennebring, <strong>Freescale</strong> Nordic");<br />

APP_INFO ("Parition %d\n\n", curr_lwe_id);<br />

i = barrier_init(&sync_barrier, get_online_core_mask());<br />

if (unlikely(i != 0)) {<br />

APP_ERROR("Barrier initialization failed");<br />

return 1;<br />

}<br />

decorated_counter = (int32_t *) stats_memalign(CACHE_LINE_SIZE, sizeof(int32_t));<br />

*decorated_counter = 0;<br />

APP_INFO("");<br />

APP_INFO("Singlecore Test:");<br />

singlecore_test();<br />

APP_INFO("Slave Partition, id %d", curr_lwe_id);<br />

atb_start = mfspr(SPR_ATBL); //start timer<br />

atb_stop = mfspr(SPR_ATBL); //stop timer<br />

atb_oh = atb_stop - atb_start;<br />

APP_INFO ("Start/Stop overhead is %d cycles.", atb_oh);<br />

spin_unlock(&init_lock);<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

12 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


}<br />

APP_INFO("");<br />

APP_INFO("Multicore Test:");<br />

multicore_test();<br />

APP_INFO("DONE!");<br />

return 0;<br />

6 <strong>Decorated</strong> Macro Functions<br />

//////////////////////////////////<br />

//// Load Definitions<br />

///////////////////////////<br />

enum LOAD_DECORATION {<br />

};<br />

LOAD_DECORATION_CLEAR = 0,<br />

LOAD_DECORATION_SET = 1,<br />

LOAD_DECORATION_DEC = 2,<br />

LOAD_DECORATION_INC = 3<br />

static inline uint8_t decorated_load_clear_8(volatile void *a){<br />

}<br />

uint8_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />

__ASM("lbdx %0, %1, %2"<br />

: "=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

return r;<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

<strong>Decorated</strong> Macro Functions<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 13<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> Macro Functions<br />

static inline uint8_t decorated_load_set_8(volatile void *a){<br />

uint8_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />

return r;<br />

}<br />

__ASM("lbdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint8_t decorated_load_dec_8(volatile void *a){<br />

uint8_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />

return r;<br />

}<br />

__ASM("lbdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint8_t decorated_load_inc_8(volatile void *a){<br />

uint8_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />

return r;<br />

}<br />

__ASM("lbdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint16_t decorated_load_clear_16(volatile void *a){<br />

uint16_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />

return r;<br />

}<br />

__ASM("lhdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

14 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


static inline uint16_t decorated_load_set_16(volatile void *a){<br />

uint16_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />

return r;<br />

}<br />

__ASM("lhdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint16_t decorated_load_dec_16(volatile void *a){<br />

uint16_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />

return r;<br />

}<br />

__ASM("lhdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint16_t decorated_load_inc_16(volatile void *a){<br />

uint16_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />

return r;<br />

}<br />

__ASM("lhdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint32_t decorated_load_clear_32(volatile void *a){<br />

uint32_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />

return r;<br />

__ASM("lwdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

<strong>Decorated</strong> Macro Functions<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 15<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> Macro Functions<br />

}<br />

static inline uint32_t decorated_load_set_32(volatile void *a){<br />

uint32_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />

return r;<br />

}<br />

__ASM("lwdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint32_t decorated_load_dec_32(volatile void *a){<br />

uint32_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />

return r;<br />

}<br />

__ASM("lwdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint32_t decorated_load_inc_32(volatile void *a){<br />

uint32_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />

return r;<br />

}<br />

__ASM("lwdx %0, %1, %2":"=r"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint64_t decorated_load_clear_64(volatile void *a){<br />

uint64_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_CLEAR;<br />

__ASM("lfddx %0, %1, %2":"=f"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

16 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


eturn r;<br />

}<br />

static inline uint64_t decorated_load_set_64(volatile void *a){<br />

}<br />

uint64_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_SET;<br />

return r;<br />

__ASM("lfddx %0, %1, %2":"=f"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint64_t decorated_load_dec_64(volatile void *a){<br />

uint64_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_DEC;<br />

return r;<br />

}<br />

__ASM("lfddx %0, %1, %2":"=f"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

static inline uint64_t decorated_load_inc_64(volatile void *a){<br />

uint64_t r;<br />

enum LOAD_DECORATION d = LOAD_DECORATION_INC;<br />

return r;<br />

}<br />

__ASM("lfddx %0, %1, %2":"=f"(r)<br />

: "r"(d), "r"(a)<br />

: "memory");<br />

//////////////////////////////////<br />

//// Store Definitions<br />

///////////////////////////<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

<strong>Decorated</strong> Macro Functions<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 17<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> Macro Functions<br />

enum STORE_DECORATION {<br />

};<br />

STORE_DECORATION_ACC_64 = 0,<br />

STORE_DECORATION_ACC_32 = 1,<br />

STORE_DECORATION_INC_ACC_64 = 2,<br />

STORE_DECORATION_INC_ACC_32 = 3<br />

struct stat32_pair_t {<br />

};<br />

int32_t inc;<br />

int32_t acc;<br />

struct stat64_pair_t {<br />

};<br />

int64_t inc;<br />

int64_t acc;<br />

static inline void decorated_store_32_acc_32(volatile void *a, register int32_t v){<br />

}<br />

volatile void *address = a;<br />

enum STORE_DECORATION d = STORE_DECORATION_ACC_32;<br />

__ASM("stwdx %0, %1, %2":<br />

:"r"(v), "r"(d), "r"(address)<br />

:"memory");<br />

static inline void decorated_store_32_inc_acc_32(volatile void *a, register int32_t v){<br />

}<br />

volatile void *address = (void *) ((uintptr_t) a + 4);<br />

enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_32;<br />

__ASM("stwdx %0, %1, %2":<br />

:"r"(v), "r"(d), "r"(address)<br />

:"memory");<br />

static inline void decorated_store_64_acc_32(volatile void *a, register int32_t v){<br />

volatile void *address = (void *) ((uintptr_t) a + 4);<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

18 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


}<br />

enum STORE_DECORATION d = STORE_DECORATION_ACC_64;<br />

__ASM("stwdx %0, %1, %2":<br />

:"r"(v), "r"(d), "r"(address)<br />

:"memory");<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

<strong>Decorated</strong> Macro Functions<br />

static inline void decorated_store_64_inc_acc_32(volatile void *a, register int32_t v){<br />

}<br />

volatile void *address = (void *) ((uintptr_t) a + 12);<br />

enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_64;<br />

__ASM("stwdx %0, %1, %2":<br />

:"r"(v), "r"(d), "r"(address)<br />

:"memory");<br />

static inline void decorated_store_64_acc_64(volatile void *a, register int64_t v){<br />

}<br />

volatile void *address = a;<br />

enum STORE_DECORATION d = STORE_DECORATION_ACC_64;<br />

__ASM("stfddx %0, %1, %2":<br />

:"f"(v), "r"(d), "r"(address)<br />

:"memory");<br />

static inline void decorated_store_64_inc_acc_64(volatile void *a, register int64_t v){<br />

}<br />

volatile void *address = (void *) ((uintptr_t) a + 8);<br />

enum STORE_DECORATION d = STORE_DECORATION_INC_ACC_64;<br />

__ASM("stfddx %0, %1, %2":<br />

:"f"(v), "r"(d), "r"(address)<br />

:"memory");<br />

//////////////////////////////////<br />

//// Notify Definitions<br />

///////////////////////////<br />

enum NOTIFY_DECORATION {<br />

NOTIFY_DECORATION_INC_64 = 0,<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 19<br />

Preliminary—Subject to Change Without Notice


<strong>Decorated</strong> Macro Functions<br />

};<br />

NOTIFY_DECORATION_INC_32 = 1,<br />

NOTIFY_DECORATION_CLEAR_64 = 2,<br />

NOTIFY_DECORATION_CLEAR_32 = 3<br />

static inline void decorated_notify_inc_32(volatile void *a){<br />

}<br />

register enum STORE_DECORATION d = NOTIFY_DECORATION_INC_32;<br />

__ASM("dsn %0, %1":<br />

:"r"(d), "r"(a)<br />

:"memory");<br />

static inline void decorated_notify_clear_32(volatile void *a){<br />

}<br />

register enum STORE_DECORATION d = NOTIFY_DECORATION_CLEAR_32;<br />

__ASM("dsn %0, %1":<br />

:"r"(d), "r"(a)<br />

:"memory");<br />

static inline void decorated_notify_inc_64(volatile void *a){<br />

}<br />

register enum STORE_DECORATION d = NOTIFY_DECORATION_INC_64;<br />

__ASM("dsn %0, %1":<br />

:"r"(d), "r"(a)<br />

:"memory");<br />

static inline void decorated_notify_clear_64(volatile void *a){<br />

}<br />

register enum STORE_DECORATION d = NOTIFY_DECORATION_CLEAR_64;<br />

__ASM("dsn %0, %1":<br />

:"r"(d), "r"(a)<br />

:"memory");<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

20 <strong>Freescale</strong> Confidential Proprietary <strong>Freescale</strong> Semiconductor<br />

Preliminary—Subject to Change Without Notice


7 Summary<br />

<strong>Decorated</strong> <strong>Operations</strong> <strong>for</strong> <strong>QorIQ</strong> <strong>P3</strong>/<strong>P4</strong>/<strong>P5</strong> <strong>Processors</strong>, Rev. A<br />

Summary<br />

Working on shared data in multicore devices poses a problem, because simultaneous access to the data<br />

without any protection gives rise to race-condition and undeterministic behavior. The traditional approach<br />

to avoid race-conditions between the cores has been to introduce locks around the shared data. However,<br />

locks decrease the level of parallelism (and there<strong>for</strong>e the scalability of the software) as well as raise new<br />

issues with both reduced per<strong>for</strong>mance and robustness as side effects.<br />

The solution described in this application note makes use of new instructions that allow a central part of<br />

the device to update the data. This can then be done in an atomic fashion and without core-specific<br />

influence. Per<strong>for</strong>mance can be as good as private data accesses, and per<strong>for</strong>mance <strong>for</strong> both synthetic<br />

worst-case tests as well application realistic tests are well above that of lock-based solutions.<br />

8 References<br />

Following is a list of helpful references used in this application note:<br />

1. Embedded Multicore: An Introduction by Jonas Svennebring, John Logan, Jakob Engblom, Patrik<br />

Strömblad. <strong>Freescale</strong> Semiconductor, Inc. 2009.<br />

2. Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities by<br />

Amdahl, Gene. AFIPS Conference Proceedings (30) 483–485 (1967).<br />

3. Experience with Processes and Monitors in Mesa by Butler W. Lampson and David D. Redell.<br />

CACM 23(2):105-117 (February 1980)<br />

4. The Deadlock problem: a classifying bibliography by Zöbel, Dieter. ACM SIGOPS Operating<br />

Systems Review 17 (4): 6–15. (October 1983)<br />

5. Eliminating receive livelock in an interrupt-driven kernel by Mogul, Jeffrey C.; K. K.<br />

Ramakrishnan. ACM TOCS 15 (3): 217-252 (August 1997)<br />

6. <strong>Freescale</strong> Book E Implementation Standards <strong>for</strong> Storage, version 0.92, 3/7/2008<br />

7. Transactional Memory: Architectural Support <strong>for</strong> Lock-Free Data Structures by Maurice Herlihy,<br />

J. Eliot B. Moss. ISCA Proceedings, 289–300 (1993).<br />

9 Revision History<br />

Table 1 provides a revision history <strong>for</strong> this application note.<br />

Rev.<br />

Number<br />

Table 1. Document Revision History<br />

Date Substantive Change(s)<br />

A 07/2010 Initial NDA release<br />

<strong>Freescale</strong> Semiconductor <strong>Freescale</strong> Confidential Proprietary 21<br />

Preliminary—Subject to Change Without Notice


How to Reach Us:<br />

Home Page:<br />

www.freescale.com<br />

Web Support:<br />

http://www.freescale.com/support<br />

USA/Europe or Locations Not Listed:<br />

<strong>Freescale</strong> Semiconductor, Inc.<br />

Technical In<strong>for</strong>mation Center, EL516<br />

2100 East Elliot Road<br />

Tempe, Arizona 85284<br />

1-800-521-6274 or<br />

+1-480-768-2130<br />

www.freescale.com/support<br />

Europe, Middle East, and Africa:<br />

<strong>Freescale</strong> Halbleiter Deutschland GmbH<br />

Technical In<strong>for</strong>mation Center<br />

Schatzbogen 7<br />

81829 Muenchen, Germany<br />

+44 1296 380 456 (English)<br />

+46 8 52200080 (English)<br />

+49 89 92103 559 (German)<br />

+33 1 69 35 48 48 (French)<br />

www.freescale.com/support<br />

Japan:<br />

<strong>Freescale</strong> Semiconductor Japan Ltd.<br />

Headquarters<br />

ARCO Tower 15F<br />

1-8-1, Shimo-Meguro, Meguro-ku<br />

Tokyo 153-0064<br />

Japan<br />

0120 191014 or<br />

+81 3 5437 9125<br />

support.japan@freescale.com<br />

Asia/Pacific:<br />

<strong>Freescale</strong> Semiconductor China Ltd.<br />

Exchange Building 23F<br />

No. 118 Jianguo Road<br />

Chaoyang District<br />

Beijing 100022<br />

China<br />

+86 10 5879 8000<br />

support.asia@freescale.com<br />

For Literature Requests Only:<br />

<strong>Freescale</strong> Semiconductor<br />

Literature Distribution Center<br />

1-800 441-2447 or<br />

+1-303-675-2140<br />

Fax: +1-303-675-2150<br />

LDCFor<strong>Freescale</strong>Semiconductor<br />

@hibbertgroup.com<br />

Document Number: AN4181<br />

Rev. A<br />

07/2010<br />

<strong>Freescale</strong> Confidential Proprietary<br />

Preliminary—Subject to Change Without Notice<br />

In<strong>for</strong>mation in this document is provided solely to enable system and software<br />

implementers to use <strong>Freescale</strong> Semiconductor products. There are no express or<br />

implied copyright licenses granted hereunder to design or fabricate any integrated<br />

circuits or integrated circuits based on the in<strong>for</strong>mation in this document.<br />

<strong>Freescale</strong> Semiconductor reserves the right to make changes without further notice to<br />

any products herein. <strong>Freescale</strong> Semiconductor makes no warranty, representation or<br />

guarantee regarding the suitability of its products <strong>for</strong> any particular purpose, nor does<br />

<strong>Freescale</strong> Semiconductor assume any liability arising out of the application or use of<br />

any product or circuit, and specifically disclaims any and all liability, including without<br />

limitation consequential or incidental damages. “Typical” parameters which may be<br />

provided in <strong>Freescale</strong> Semiconductor data sheets and/or specifications can and do<br />

vary in different applications and actual per<strong>for</strong>mance may vary over time. All operating<br />

parameters, including “Typicals” must be validated <strong>for</strong> each customer application by<br />

customer’s technical experts. <strong>Freescale</strong> Semiconductor does not convey any license<br />

under its patent rights nor the rights of others. <strong>Freescale</strong> Semiconductor products are<br />

not designed, intended, or authorized <strong>for</strong> use as components in systems intended <strong>for</strong><br />

surgical implant into the body, or other applications intended to support or sustain life,<br />

or <strong>for</strong> any other application in which the failure of the <strong>Freescale</strong> Semiconductor product<br />

could create a situation where personal injury or death may occur. Should Buyer<br />

purchase or use <strong>Freescale</strong> Semiconductor products <strong>for</strong> any such unintended or<br />

unauthorized application, Buyer shall indemnify and hold <strong>Freescale</strong> Semiconductor<br />

and its officers, employees, subsidiaries, affiliates, and distributors harmless against all<br />

claims, costs, damages, and expenses, and reasonable attorney fees arising out of,<br />

directly or indirectly, any claim of personal injury or death associated with such<br />

unintended or unauthorized use, even if such claim alleges that <strong>Freescale</strong><br />

Semiconductor was negligent regarding the design or manufacture of the part.<br />

<strong>Freescale</strong> and the <strong>Freescale</strong> logo are trademarks of <strong>Freescale</strong><br />

Semiconductor, Inc. Reg. U.S. Pat. & Tm. Off. CoreNet and <strong>QorIQ</strong> are<br />

trademarks of <strong>Freescale</strong> Semiconductor, Inc. All other product or service<br />

names are the property of their respective owners. The Power Architecture<br />

and Power.org word marks and the Power and Power.org logos and related<br />

marks are trademarks and service marks licensed by Power.org.<br />

© 2010 <strong>Freescale</strong> Semiconductor, Inc.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!