Intel(R) IQ80315 I/O Processor DMA and XOR Library APIs and ...

More documents

Recommendations

Info

Intel ® GW80314 I/O Processor DMA and XOR LibraryData Cache Policy Mechanics4.3 Data Cache and Write PolicyAll of these descriptor bits affect the behavior of the Data Cache and Write Buffer.When the X bit for a descriptor is zero, the C and B bits operate as mandated by the ARMarchitecture. This behavior is detailed in Table 4.When the X bit for a descriptor is one, the C and B bits’ meaning is extended, as detailed in Table 5.Table 4. Data Cache and Buffer Behavior when X = 0C B Cacheable? Bufferable? Write PolicyLineAllocationPolicyNotes0 0 N N - - Stall until complete a0 1 N Y - -1 0 Y Y Write Through Read Allocate1 1 Y Y Write Back Read Allocatea. Normally, the processor continues executing after a data access when no dependency on that access is encountered. Withthis setting, the processor stalls execution until the data access completes. This guarantees to software that the data accesshas taken effect by the time execution of the data access instruction completes. External data aborts from such accessesare imprecise.Table 5. Data Cache and Buffer Behavior when X = 1C B Cacheable? Bufferable? Write PolicyLineAllocationPolicyNotes0 0 - - - - Unpredictable -- do not use0 1 N Y - -1 0(Mini DataCache)- - -1 1 Y Y Write BackRead/WriteAllocateWrites do not coalesce intobuffers aCache policy is determinedby MD field of AuxiliaryControl registera. Normally, bufferable writes can coalesce with previously buffered data in the same address range20 APIs and Testbench White Paper
Intel ® GW80314 I/O Processor DMA and XOR LibraryOptimization of Descriptor Processing Software5.0 Optimization of Descriptor Processing SoftwarePerformance benefits can be gained by tuning the software performing descriptor processing.Central to the tuning process is using the feature of the Intel ® XScale microarchitecture thatallows separate Data Cache policies for individual memory pages. This allows the data cachepolicy for descriptor processing to be customized for runtime optimization.When tuning custom applications keep in mind:• In performance experiments, better performance was achieved by prebuilding linked listverses using the append resume function. Constraints of custom applications may or may notallow the prebuilding of lists.• When append is used, the uncached unbuffered memory mapping for the last descriptor is usedto append the next descriptor. This eliminates a 16 byte cache flush (since there are two dirtybits for each cache line each for 16 bytes).• Descriptor processing is open to software optimization while DMA/XOR data transfer timesare the same for all approaches. Also, the DMA and XOR engine run concurrently with theIntel ® XScale core.• When multiple data cache policies are comparable in runtime performance, choosing thepolicy that has data caches OFF is preferable, since uncached memory regions do not allocatecachelines and impact costly cacheline evictions to ram.• GNUPro Toolset Compiler optimization Level -O2 is best for performance in the performancetest cases.• For all performance cases, increasing the descriptor byte count field does not alter the Policyproviding the best performance.• In a custom application, when caches are ON for a memory region and the preload can be thatplaced in a compute bound section of code, preload provides significant benefit.• Custom applications performing GW80314 DMA/XOR descriptor processing may showdifferent comparative results, when different concurrent applications produce different DataBus and Data Cache interactions. Examples:— High usage of data cache by application could result in cacheline evictions. In this caseuncached descriptors may show better performance.— High bus traffic by a concurrent application may alter the comparative performance ofcached Policies— Interleaving of data bus transactions while building descriptors in RAM may reduceopportunities for coalescing.• Typically data regions are set as Policy ‘000’. This eliminates synchronizing the source anddestination data with the Data Cache.APIs and Testbench White Paper 21
Page 1 and 2: Intel ® GW80314 I/O ProcessorDMA a
Page 3 and 4: ContentsContents1.0 Introduction...
Page 5 and 6: ContentsE.2.10 void * lib_q_get(Xor
Page 7 and 8: Revision HistoryRevision HistoryDat
Page 9 and 10: Intel ® GW80314 I/O Processor DMA
Page 19: Intel ® GW80314 I/O Processor DMA
Page 71 and 72:
Intel ® GW80314 I/O Processor DMA
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
show all

Intel(R) IQ80315 I/O Processor DMA and XOR Library APIs and ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?