12.07.2015 Views

Intel(R) IQ80315 I/O Processor DMA and XOR Library APIs and ...

Intel(R) IQ80315 I/O Processor DMA and XOR Library APIs and ...

Intel(R) IQ80315 I/O Processor DMA and XOR Library APIs and ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong><strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>APIs</strong> <strong>and</strong> Testbench White PaperSeptember 2003Document Number: 253675-001


Information in this document is provided in connection with <strong>Intel</strong> ® products. No license, express or implied, by estoppel or otherwise, to any intellectualproperty rights is granted by this document. Except as provided in <strong>Intel</strong>'s Terms <strong>and</strong> Conditions of Sale for such products, <strong>Intel</strong> assumes no liabilitywhatsoever, <strong>and</strong> <strong>Intel</strong> disclaims any express or implied warranty, relating to sale <strong>and</strong>/or use of <strong>Intel</strong> products including liability or warranties relating tofitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. <strong>Intel</strong> products are notintended for use in medical, life saving, or life sustaining applications.<strong>Intel</strong> may make changes to specifications <strong>and</strong> product descriptions at any time, without notice.Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” <strong>Intel</strong> reserves these forfuture definition <strong>and</strong> shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.The <strong>Intel</strong> ® Embedded I/O <strong>Processor</strong>s may contain design defects or errors known as errata which may cause the product to deviate from publishedspecifications. Current characterized errata are available on request.Contact your local <strong>Intel</strong> sales office or your distributor to obtain the latest specifications <strong>and</strong> before placing your product order.Copies of documents which have an ordering number <strong>and</strong> are referenced in this document, or other <strong>Intel</strong> literature may be obtained by calling1-800-548-4725 or by visiting <strong>Intel</strong>'s website at http://www.intel.com.Copyright© <strong>Intel</strong> Corporation, 2003AlertVIEW, i960, AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, Commerce Cart, CT Connect, CT Media, Dialogic,DM3, EtherExpress, ETOX, FlashFile, GatherRound, i386, i486, iCat, iCOMP, Insight960, InstantIP, <strong>Intel</strong>, <strong>Intel</strong> logo, <strong>Intel</strong>386, <strong>Intel</strong>486, <strong>Intel</strong>740,<strong>Intel</strong>DX2, <strong>Intel</strong>DX4, <strong>Intel</strong>SX2, <strong>Intel</strong> ChatPad, <strong>Intel</strong> Create&Share, <strong>Intel</strong> Dot.Station, <strong>Intel</strong> GigaBlade, <strong>Intel</strong> InBusiness, <strong>Intel</strong> Inside, <strong>Intel</strong> Inside logo, <strong>Intel</strong>NetBurst, <strong>Intel</strong> NetStructure, <strong>Intel</strong> Play, <strong>Intel</strong> Play logo, <strong>Intel</strong> Pocket Concert, <strong>Intel</strong> SingleDriver, <strong>Intel</strong> SpeedStep, <strong>Intel</strong> StrataFlash, <strong>Intel</strong> TeamStation,<strong>Intel</strong> WebOutfitter, <strong>Intel</strong> Xeon, <strong>Intel</strong> XScale, Itanium, JobAnalyst, LANDesk, LanRover, MCS, MMX, MMX logo, NetPort, NetportExpress, Optimizerlogo, OverDrive, Paragon, PC Dads, PC Parents, Pentium, Pentium II Xeon, Pentium III Xeon, Performance at Your Comm<strong>and</strong>, ProShare,RemoteExpress, Screamline, Shiva, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside, The Journey Inside, This Way In,TokenExpress, Trillium, Vivonic, <strong>and</strong> VTune are trademarks or registered trademarks of <strong>Intel</strong> Corporation or its subsidiaries in the United States <strong>and</strong>other countries.*Other names <strong>and</strong> br<strong>and</strong>s may be claimed as the property of others.2 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


ContentsContents1.0 Introduction....................................................................................................................................91.1 Demonstrate Libraries Testbench Menu...............................................................................91.1.1 Description of Demonstration Cases .......................................................................92.0 <strong>Library</strong> Functional Overview ......................................................................................................102.1 <strong>Library</strong> Usage Models.........................................................................................................103.0 <strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overview..............................................................113.1 Data Paths <strong>and</strong> Components..............................................................................................113.1.1 Function .................................................................................................................113.2 <strong>DMA</strong>/<strong>XOR</strong> Block .................................................................................................................133.2.1 Overview................................................................................................................133.2.2 <strong>DMA</strong>/<strong>XOR</strong> Features...............................................................................................143.2.3 Memory: SRAM <strong>and</strong> SDRAM.................................................................................153.3 <strong>Intel</strong> ® XScale Microarchitecture .......................................................................................153.3.1 Overview................................................................................................................153.3.2 <strong>Intel</strong> ® XScale Microarchitecture Memory Management ......................................163.3.3 Interrupt Controller.................................................................................................173.3.4 Data Cache............................................................................................................174.0 Data Cache Policy Mechanics ....................................................................................................184.1 Introduction .........................................................................................................................184.2 Page Tables........................................................................................................................184.3 Data Cache <strong>and</strong> Write Policy..............................................................................................205.0 Optimization of Descriptor Processing Software.....................................................................216.0 <strong>Library</strong> ..........................................................................................................................................226.1 Ecosystem: Application <strong>and</strong> <strong>XOR</strong>/<strong>DMA</strong> <strong>Library</strong> <strong>APIs</strong> .........................................................226.1.1 Flow Sequence Description: <strong>XOR</strong> <strong>and</strong> <strong>DMA</strong> <strong>Library</strong> .............................................236.1.1.1 Initialization ............................................................................................236.1.1.2 Request..................................................................................................236.1.1.3 Post Transaction Cleanup......................................................................236.1.1.4 Terminate...............................................................................................246.1.2 Redboot* <strong>Intel</strong> ® IQGW80314 SV Board Memory Map...........................................257.0 <strong>Library</strong> <strong>and</strong> Test Bench File Organization<strong>and</strong> Compilation ..........................................................................................................................267.1 Folder <strong>and</strong> File Organization ..............................................................................................267.1.1 / Files: ....................................................................................................................267.1.2 /Lib Files: ...............................................................................................................267.1.3 /Bench Files: ..........................................................................................................277.2 Instruction to Build <strong>and</strong> Run................................................................................................278.0 General Notes ..............................................................................................................................288.1 Cache Implications Using Append Resume Macros...........................................................288.2 GCSR is used to set Descriptor Completion Interrupts ......................................................288.3 Using Resume ....................................................................................................................28<strong>APIs</strong> <strong>and</strong> Testbench White Paper 3


Contents8.4 Changing from <strong>DMA</strong> to <strong>XOR</strong> Descriptors on the Same Channel ....................................... 288.5 Error H<strong>and</strong>ling..................................................................................................................... 288.6 <strong>DMA</strong>/<strong>XOR</strong> Channel Arbitration for the SFN Port ................................................................ 298.7 GNUPro Toolset Compiler Optimization ............................................................................. 298.8 Reclamation of Descriptors................................................................................................. 298.9 Extending <strong>Library</strong> to Run with Multiple Threads ................................................................. 298.10 When Using Append / Resume Sequence .........................................................................298.11 Append Operation Sequencing........................................................................................... 308.12 <strong>DMA</strong> or <strong>XOR</strong> Descriptors Variables should be Volatile ...................................................... 309.0 Conclusion................................................................................................................................... 31A <strong>Library</strong> Flow Charts..................................................................................................................... 32B <strong>Library</strong> Data Structures............................................................................................................... 41B.1 Descriptor Headers..................................................................................................................... 41B.2 <strong>DMA</strong> Descriptors ........................................................................................................................ 41B.3 <strong>XOR</strong> Descriptors......................................................................................................................... 41B.4 <strong>Intel</strong> ® XScale Microarchitecture Page Tables<strong>and</strong> <strong>Library</strong> Memory Map (xscale.h) ........................................................................................... 43C <strong>DMA</strong><strong>XOR</strong>80314.h ......................................................................................................................... 44D dmaxor_desc_mgr.h ................................................................................................................... 64E <strong>Library</strong> Function Prototypes ...................................................................................................... 69E.1 Functions Included in xscale.h <strong>and</strong> xscale.c .............................................................................. 69E.1.1 void lib_flush_data_cache(void) .................................................................................... 69E.1.2 int lib_memmap_malloc(int dma_desc_Mbs,int <strong>XOR</strong>_desc_Mbs, int data_Mbs)................................................................................ 69E.1.3 void lib_set_xcb_mem_range(unsigned int xcb,void * virt_addr_base, int size_in_bytes) ....................................................................... 69E.1.4 Page_Type lib_get_page_attributes(unsigned long virt_addr) ...................................... 70E.1.5 Page_Type lib_set_page_xcb(unsigned long base,unsigned int xcb) ........................................................................................................... 70E.2 Functions included in dmaxor_desc_mgr.h<strong>and</strong> dmaxor_desc_mgr.c ............................................................................................................71E.2.1 XorDma_80314_Type * lib_new_mgr(void)................................................................... 71E.2.2 int lib_buffersize(void)....................................................................................................71E.2.3 void lib_init(XorDma_80314_Type * mgrt,void * desc_baseaddr)...................................................................................................72E.2.4 void lib_free_mgr(XorDma_80314_Type * mgr) ............................................................72E.2.5 void * lib_stack_pop(XorDma_80314_Type * mgr,enum CHANNEL channel)............................................................................................. 72E.2.6 Bool lib_stack_push(XorDma_80314_Type * mgr,E.2.7void * frame, enum CHANNEL channel)........................................................................ 73void * lib_top_of_stack(XorDma_80314_Type * mgr,enum CHANNEL channel)............................................................................................. 73E.2.8 inline int ............... lib_postq_appnd_resume_sdram(XorDma_80314_Type * mgr, void *frame,enum PORT port,enum GCSR_OP_CMD cmd,unsigned int gcsr,enum CHANNELchannel)74E.2.9int lib_reclaim(XorDma_80314_Type * mgr,enum CHANNEL channel)............................................................................................. 744 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


ContentsE.2.10 void * lib_q_get(XorDma_80314_Type * mgr, enum CHANNEL channel) ....................75E.2.11 int lib_q_put(XorDma_80314_Type * mgr, void * frame,enum CHANNEL channel).............................................................................................76E.3 Functions Included in chain_interrupt.h <strong>and</strong> chain_interrupt.c ...................................................77E.3.1 void intH<strong>and</strong>lerDetach(void) ..........................................................................................77E.3.2 void callintH<strong>and</strong>lerAttach(void) ......................................................................................77E.3.3 void intH<strong>and</strong>lerAttach(void (*irq)(void),void (*fiq)(void)).................................................77E.3.4 void lib_irq_h<strong>and</strong>ler(void)__attribute__ ((__naked__)) ..................................................78E.3.5 void lib_fiq_h<strong>and</strong>ler(void)__attribute__ ((__naked__)) ..................................................78F Testbench: Data Structures........................................................................................................79F.1 bench.h.......................................................................................................................................79G Test Bench <strong>Library</strong> Function Prototypes ..................................................................................80G.1 bench.c .......................................................................................................................................80G.1.1 int main(void); ................................................................................................................80G.1.2 void print_title(enum build b)..........................................................................................80G.1.3 void generate_src_dst(void); .........................................................................................80G.2 lib_demo_cases.c .......................................................................................................................81G.2.1 void lib_demo_sdram(void)............................................................................................81H Redboot Memory Map .................................................................................................................82IExample Code..............................................................................................................................83J Related Documents .....................................................................................................................90<strong>APIs</strong> <strong>and</strong> Testbench White Paper 5


ContentsFigures1 GW80314 Block Diagram ...........................................................................................................122 <strong>DMA</strong>/<strong>XOR</strong> Engine....................................................................................................................... 143 <strong>DMA</strong>/<strong>XOR</strong> Channel .................................................................................................................... 151 <strong>Intel</strong> ® 80200 <strong>Processor</strong> based on <strong>Intel</strong> ® XScale Microarchitecture Features.......................... 162 <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> Flow Diagram................................................................................................ 24Tables2 Second-level Descriptors for Coarse Page Table ...................................................................... 193 Second-level Descriptors for Fine Page Table ........................................................................... 191 First-level Descriptors................................................................................................................. 194 Data Cache <strong>and</strong> Buffer Behavior when X = 0............................................................................. 205 Data Cache <strong>and</strong> Buffer Behavior when X = 1............................................................................. 206 System Physical Memory Map ...................................................................................................256 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


Revision HistoryRevision HistoryDate Revision DescriptionJuly 2003 001 Initial Release.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 7


Revision HistoryThis page intentionally left blank.8 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Introduction1.0 Introduction<strong>Intel</strong> provides customers of the <strong>Intel</strong> ® GW80314 I/O processor 1 (GW80314) an optimized turnkey<strong>Library</strong> solution for <strong>DMA</strong>/<strong>XOR</strong> applications to provide a fast development ramp. The <strong>DMA</strong>/<strong>XOR</strong><strong>Library</strong> synergistically combines with existing <strong>Intel</strong> collateral (See Section J, “RelatedDocuments” on page 90 for related web documents).The turnkey Optimized <strong>XOR</strong> <strong>and</strong> <strong>DMA</strong> <strong>Library</strong> consists of:— <strong>DMA</strong>/<strong>XOR</strong> register set .h files.— Functions to set Data Cache Policy for specified memory pages.— Integrated Descriptor H<strong>and</strong>ling.— Required macros.— Interrupt h<strong>and</strong>ler with setup of Interrupt Controller.— Rules for optimization.— Test bench demonstrating <strong>Library</strong> implementation.1.1 Demonstrate Libraries Testbench MenuThe <strong>Library</strong> menu is shown below. The testbench provides menu driven test cases implementing theLibraries.Note:<strong>DMA</strong>/<strong>XOR</strong> descriptors can be run from SRAM or SDRAM.1. SDRAM: <strong>DMA</strong> with crc <strong>and</strong> <strong>XOR</strong> <strong>Library</strong> (R<strong>and</strong>om Channel <strong>and</strong> <strong>DMA</strong> vs.<strong>XOR</strong>)2. SRAM : <strong>DMA</strong> with crc <strong>and</strong> <strong>XOR</strong> <strong>Library</strong> (R<strong>and</strong>om Channel <strong>and</strong> <strong>DMA</strong> vs.<strong>XOR</strong>)• Option 2 is not currently available.1.1.1 Description of Demonstration CasesThe <strong>DMA</strong>/<strong>XOR</strong> engine has four identical channels operating independently. They arbitrate usinground robin for the port interfacing the switch fabic network (see Figure 1). Each channel mayfunction as a <strong>DMA</strong> or <strong>XOR</strong> engine.The <strong>Library</strong> Demo cases iterate 30000 times. For each iteration, a r<strong>and</strong>om selection is madebetween each of the four channels <strong>and</strong> whether to perform a <strong>DMA</strong> or <strong>XOR</strong> transaction on thatchannel. An aligned buffer is requisitioned from the Free Stack as a <strong>DMA</strong> or <strong>XOR</strong> descriptor <strong>and</strong> iscompleted <strong>and</strong> flushed from the Data Cache to SDRAM or SRAM. If it is a <strong>DMA</strong> transaction, thena CRC32 calculation is initiated as well.Following the transaction being completed, the <strong>DMA</strong>/CRC32 or <strong>XOR</strong> transaction results arevalidated for accuracy <strong>and</strong> completeness.1. ARM* architecture compliant.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 9


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Functional Overview2.0 <strong>Library</strong> Functional OverviewThe <strong>Intel</strong> ® GW80314 I/O processor features includes <strong>DMA</strong>/<strong>XOR</strong> Block <strong>and</strong> an <strong>Intel</strong> ® XScale core. One of the important features of the <strong>Intel</strong> ® XScale core is the ability to customize cachepolicy by memory page.The base <strong>DMA</strong>/<strong>XOR</strong> GW80314 <strong>Library</strong> provides a hardware interface/memory map:• <strong>DMA</strong>/<strong>XOR</strong> Block.• <strong>Intel</strong> ® XScale Microarchitecture.The complete <strong>Library</strong> solution provides:• Integrated descriptor h<strong>and</strong>ling (one set of <strong>APIs</strong>) for <strong>DMA</strong>/<strong>XOR</strong> on four channels:— pre-runtime allocation/alignment of <strong>XOR</strong>/ <strong>DMA</strong> buffers (descriptors).— Descriptor Free Stacks <strong>and</strong> Post Queues.— Free descriptors by pointer allocation.— Descriptor reclamation from descriptor chain.• Customized cache policy for descriptor processing <strong>and</strong> data memory regions.• Interrupt controller setup for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> transactions including chaining in of interrupth<strong>and</strong>lers.• <strong>Library</strong> Demos: Present full implementation of the <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> functionality.2.1 <strong>Library</strong> Usage ModelsThe <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> is flexible <strong>and</strong> can be:• Implemented as a turnkey solution.• Used cafeteria style with any or all components, code, methodologies <strong>and</strong> documentreferences.10 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overview3.0 <strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component OverviewThis section provides an overview of the <strong>DMA</strong>/<strong>XOR</strong> Controller <strong>and</strong> the <strong>Intel</strong> ® XScale core <strong>and</strong>Transaction Data Paths.3.1 Data Paths <strong>and</strong> Components3.1.1 FunctionFigure 1 provides an overview of the GW80314 with the data paths <strong>and</strong> <strong>DMA</strong>/<strong>XOR</strong> relevantcomponents: <strong>XOR</strong> Controller, <strong>DMA</strong> Controller <strong>and</strong> <strong>Intel</strong> ® XScale core. Here is a description of dataflow.• <strong>Intel</strong> ® XScale microarchitecture Descriptor Processing:— Code is run from SDRAM by the <strong>Intel</strong> ® XScale microarchitecture using the st<strong>and</strong>ardfetch, decode <strong>and</strong> execute model.— Descriptors are written to SDRAM or SRAM.— Memory mapped <strong>DMA</strong>/<strong>XOR</strong> registers are written to initiate transaction. <strong>DMA</strong> or <strong>XOR</strong>controller, using the pointer to memory, reads the descriptor values from RAM into <strong>DMA</strong> or<strong>XOR</strong> Controller registers. Based on descriptor values, the <strong>DMA</strong>/<strong>XOR</strong> Controller beginsexecution.• <strong>DMA</strong>/<strong>XOR</strong> Controller Operations.— For a <strong>XOR</strong> or <strong>DMA</strong> transfer: the data is transferred first into the <strong>DMA</strong>/<strong>XOR</strong> block <strong>and</strong>then transferred to destination.In terms of software descriptor processing optimization, the primary goal is to minimize traffic on thebus. The programmer can support this objective by minimizing traffic for descriptor processing betweenSDRAM/SRAM <strong>and</strong> the <strong>Intel</strong> ® XScale core. The bus transactions created by the <strong>DMA</strong>/<strong>XOR</strong> blockare h<strong>and</strong>led by the controllers.The GW80314 is designed as a fabric centric, any port-to-any port bridge. All transactions areplaced into fabric packets <strong>and</strong> routed through address based port selection to another fabric port.The bridge is based on the ‘store <strong>and</strong> forward’ concept where transactions are buffered at theincoming port. When a packet is complete the incoming port knows the size of the packet, <strong>and</strong> itmay be burst across the fabric to the outgoing port. As the timing of the packet is deterministic atthe outgoing port, the transaction may be started at the outgoing port as soon as the header arrives.All outgoing ports have the capability to buffer incoming packets to allow for delayed access to theexternal bus.The internal packets are limited to 256 bytes in size in order to limit the latency <strong>and</strong> to provide acertain quality of service. Larger PCI transactions are broken into 256 byte transactions at the PCIport.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 11


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component OverviewFigure 1.GW80314 Block Diagram32-bit Address / 64-bit Data100 MHz OperationJTAG<strong>DMA</strong>CRC32C<strong>XOR</strong><strong>Intel</strong> ® 80200Bus InterfaceSRAMDDRInterface64-72 bit / 200 MHzDDR MemoryPCI-XPCI-XSwitchingFabric10/100/1GEthernetRegistersInterrupt4 Timers IController2 CUARTsHostPort4 ChipSelectsB1342-0112 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overview3.2 <strong>DMA</strong>/<strong>XOR</strong> Block3.2.1 OverviewThe <strong>DMA</strong>/<strong>XOR</strong> engine has four identical channels operating independently. Each channel mayfunction as a <strong>DMA</strong> engine or as an <strong>XOR</strong> engine. As a <strong>DMA</strong> engine, it may transfer data from anyport to any other port <strong>and</strong> provide CRC calculations on the transferred data <strong>and</strong> memory filloperations. As an <strong>XOR</strong> engine, it can perform parity checking <strong>and</strong> <strong>XOR</strong> operations on multipleblocks of data.For both the <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong>, a channel may be configured to operate in Direct Mode (singleoperation) or Linked-List mode (multiple operations by stepping through a linked series ofDescriptors in external memory).Since programming of each of the four channels is done in the same way, this section discusses<strong>DMA</strong>/<strong>XOR</strong> engine operation within the context of a single channel (unless otherwise specified). Inaddition, since the source <strong>and</strong> destination ports of data can be any port, the terms “source” <strong>and</strong>“destination” will be used with no reference to bus type (i.e., PCI-X versus CIU bus). Furthermore,the lower-case ‘x’ will be used in register names to indicate all of the four channels (0-3).The <strong>DMA</strong>/<strong>XOR</strong> registers specify:• Source address• Destination Address• Descriptor Addresses• Modes of Operation• Mapping to for proper byte/word conversion• Byte counts (Maximum of 16Mbytes)Note:Note:This <strong>DMA</strong>/<strong>XOR</strong> engine supports data transfers between unaligned source <strong>and</strong> destinationaddresses. The responsibility of alignment or misalignment of data will fall on the <strong>DMA</strong>/<strong>XOR</strong>engine.All <strong>DMA</strong> operations are assumed to be to prefetchable memory.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 13


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component OverviewFigure 2.<strong>DMA</strong>/<strong>XOR</strong> Engine<strong>DMA</strong>_ARB(Arbitrate for Access to SFN Gasket)Select<strong>DMA</strong>_CHAN0<strong>DMA</strong>_CHAN1 <strong>DMA</strong>_CHAN2 <strong>DMA</strong>_CHAN3Data In<strong>DMA</strong>_MUXHeader Info(Mux Channel Data <strong>and</strong> Header Info to SFN Gasket)CRC/<strong>XOR</strong> Data<strong>DMA</strong>_SFN_GSKTSFN OutSFN InB1970-013.2.2 <strong>DMA</strong>/<strong>XOR</strong> FeaturesThis block has the following features:• Round robin arbitration between the four channel for the SFN port• Four channel support; each channel operates independently• Data transfer from <strong>and</strong> to any port <strong>and</strong> memory fill as a <strong>DMA</strong> engine• <strong>XOR</strong> operation, <strong>and</strong> parity checking as an <strong>XOR</strong> engine• Scatter gather (or Linked-List) <strong>and</strong> direct modes• <strong>XOR</strong> operation of up to 16 blocks of data• Directly fill the store queue with the first block of <strong>XOR</strong> data (optional)• Interrupt on completed segment, chain, <strong>and</strong> error (all optional)14 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overview• Mode selectable byte/word conversion on transferred data• Ability to launch <strong>DMA</strong> or <strong>XOR</strong> operation from a single write transaction• Calculate CRC on the <strong>DMA</strong> transferred data based on the CRC-32C algorithm required by theiSCSI Specification• Pipelines read requests or read <strong>and</strong> write requests for better performance• Go/stop/halt control of data transfer operationFigure 3.<strong>DMA</strong>/<strong>XOR</strong> ChannelAHB RegistersDirect_Fill<strong>XOR</strong> Store Queue<strong>XOR</strong> ControlRegister UpdateControls64-Bit Data Path256 Byte Accumulator32-Bit CRCFSMCRC GeneratorCRCControlPacketData Out64-Bit Data Path32-Bit CRCSFN Packet InPacketHeaderInfo OutB1971-013.2.3 Memory: SRAM <strong>and</strong> SDRAMFor <strong>DMA</strong>/<strong>XOR</strong> Descriptor processing both SRAM <strong>and</strong> SDRAM can be used. Both are mapped tocache <strong>and</strong> both can be set of any of the XScale Data Cache Policies.3.3 <strong>Intel</strong> ® XScale Microarchitecture3.3.1 OverviewThe <strong>Intel</strong> ® XScale microarchitecture (compliant with ARM* Architecture V5TE), is designed forhigh-performance <strong>and</strong> low-power; leading the industry in mW/MIPs. The <strong>Intel</strong> ® XScale microarchitecture integrates a bus controller <strong>and</strong> an interrupt controller around a core processor, with<strong>APIs</strong> <strong>and</strong> Testbench White Paper 15


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overviewintended embedded markets such as: h<strong>and</strong>held devices, networking, remote access servers, etc. Thistechnology is ideal for internet infrastructure products such as network <strong>and</strong> I/O processors, whereultimate performance is critical for moving <strong>and</strong> processing large amounts of data quickly.The <strong>Intel</strong> ® XScale microarchitecture incorporates an extensive list of architecture features thatallows it to achieve high performance. This rich feature set allows programmers to select theappropriate features that obtains the best performance for their application. Many of thearchitectural features added to <strong>Intel</strong> ® XScale microarchitecture help hide memory latency whichoften is a serious impediment to high-performance processors. This includes:• the ability to continue instruction execution even while the data cache is retrieving data fromexternal memory.• a write buffer.• write-back caching.• various data cache allocation policies which can be configured different for each application.• cache locking.• <strong>and</strong> a pipelined external bus.All these features improve the efficiency of the external bus.Figure 1.<strong>Intel</strong> ® 80200 <strong>Processor</strong> based on <strong>Intel</strong> ® XScale Microarchitecture FeaturesInstructionCache32 Kbytes32 waysLockable by lineData CacheMax 32 Kbytes32 wayswr-back orwr-throughHit undermissData RAMMax 28 KbytesRe-map ofdata cacheMini-DataCache2 Kbytes2 waysBranch TargetBuffer2 Kbytes2 waysIMMU32 entry TLBFully associativeLockable by entryDMMU32 entry TLBFully associativeLockable by entryFillBuffer4 - 8 entriesPerformanceMonitoringDebugHardware BreakpointBranch History TablePowerManagementIdleSleepMACSingle CycleThroughput (16*32)16-bit SIMD40-bit AccumulatorWrite Buffer8 entriesFull coalescingJTAGInterrupt ControllerInterrupt MaskingFIQ/IRQ SteeringPend RegisterBus Controller1 Gbyte/secPipelined, de-multiplexedECC protectionB1307-013.3.2 <strong>Intel</strong> ® XScale Microarchitecture Memory ManagementThe <strong>Intel</strong> ® XScale microarchitecture implements the Memory Management Unit (MMU)Architecture specified in the ARM Architecture Reference Manual. The MMU provides accessprotection <strong>and</strong> virtual to physical address translation.The MMU Architecture also specifies the caching policies for the instruction cache <strong>and</strong> datamemory. These policies are specified as page attributes <strong>and</strong> include:16 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Component Overview• identifying code as cacheable or non-cacheable.• selecting between the mini-data cache or data cache.• write-back or write-through data caching.• enabling data write allocation policy.• enabling the write buffer to coalesce stores to external memory.See the <strong>Intel</strong> ® XScale Microarchitecture Programmer’s Reference Manual for more detail.3.3.3 Interrupt ControllerAn interrupt controller is implemented on the <strong>Intel</strong> ® XScale microarchitecture that providesmasking of interrupts <strong>and</strong> the ability to steer interrupts to FIQ or IRQ. It is accessed throughCoprocessor 13 registers. See the <strong>Intel</strong> ® XScale Microarchitecture Programmer’s ReferenceManual for more detail.3.3.4 Data CacheThe <strong>Intel</strong> ® XScale microarchitecture implements a 32-Kbyte, a 32-way set associative data cache<strong>and</strong> a 2-Kbyte, 2-way set associative mini-data cache. Each cache has a line size of 32 bytes,supports write-through or write-back caching.The data/mini-data cache is controlled by page attributes defined in the MMU Architecture <strong>and</strong> bycoprocessor 15.See the <strong>Intel</strong> ® XScale Microarchitecture Programmer’s Reference Manual discusses all this inmore detail.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 17


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Data Cache Policy Mechanics4.0 Data Cache Policy Mechanics4.1 IntroductionData cache policies can be customized for each memory page allowing different software operationsto interface with a tailored Data Cache Policy. Data Cache policies are setup <strong>and</strong> controlled using the<strong>Intel</strong> ® XScale Microarchitecture page tables. Also see ARM Architecture Reference Manual for adescription of the Page Table Translation process.4.2 Page TablesThe <strong>Intel</strong> ® XScale microarchitecture extends the page attributes defined by the C <strong>and</strong> B bits inthe page descriptors with an additional X bit. This bit allows four more attributes to be encodedwhen X=1. These new encodings include allocating data for the mini-data cache <strong>and</strong> write-allocatecaching. A full description of the encodings can be found in the <strong>Intel</strong> ® XScale MicroarchitectureProgrammer’s Reference Manual.The <strong>Intel</strong> ® XScale microarchitecture retains ARM definitions of the C <strong>and</strong> B encoding when X =0, which is different than the first generation <strong>Intel</strong> ® StrongARM products. The memory attributefor the mini-data cache has been moved <strong>and</strong> replaced with the write-through caching attribute.When write-allocate is enabled, a store operation that misses the data cache (cacheable data only)generates a line fill. When disabled, a line fill only occurs when a load operation misses the datacache (cacheable data only).Write-through caching causes all store operations to be written to memory, whether they arecacheable or not cacheable. This feature is useful for maintaining data cache coherency.These attributes are programmed in the translation table descriptors, which are highlighted inTable 1, “First-level Descriptors” on page 19, Table 2, “Second-level Descriptors for Coarse PageTable” on page 19 <strong>and</strong> Table 3, “Second-level Descriptors for Fine Page Table” on page 19. Twosecond-level descriptor formats have been defined for <strong>Intel</strong> ® XScale microarchitecture, one isused for the coarse page table <strong>and</strong> the other is used for the fine page table.Note:P bit is not implemented.18 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Data Cache Policy MechanicsTable 1.First-level Descriptors31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0SBZ 0 0Coarse page table base address P Domain SBZ 0 1Section base address SBZ TEX AP P Domain 0 C B 1 0Fine page table base address SBZ P Domain SBZ 1 1Table 2.Second-level Descriptors for Coarse Page Table31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0SBZ 0 0Large page base address TEX AP3 AP2 AP1 AP0 C B 0 1Small page base address AP3 AP2 AP1 AP0 C B 1 0Extended small page base address SBZ TEX AP C B 1 1Table 3.Second-level Descriptors for Fine Page Table31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0SBZ 0 0Large page base address TEX AP3 AP2 AP1 AP0 C B 0 1Small page base address AP3 AP2 AP1 AP0 C B 1 0Tiny Page Base Address TEX AP C B 1 1The P bit controls ECC.Note:P bit is not implemented.The TEX (Type Extension) field is present in several of the descriptor types. In the <strong>Intel</strong> ® XScale microarchitecture, only the LSB of this field is used; this is called the X bit.A Small Page descriptor does not have a TEX field. For these descriptors, TEX is implicitly zero;that is, they operate as when the X bit had a ‘0’ value. The X bit, when set, modifies the meaning ofthe C <strong>and</strong> B bits.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 19


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Data Cache Policy Mechanics4.3 Data Cache <strong>and</strong> Write PolicyAll of these descriptor bits affect the behavior of the Data Cache <strong>and</strong> Write Buffer.When the X bit for a descriptor is zero, the C <strong>and</strong> B bits operate as m<strong>and</strong>ated by the ARMarchitecture. This behavior is detailed in Table 4.When the X bit for a descriptor is one, the C <strong>and</strong> B bits’ meaning is extended, as detailed in Table 5.Table 4. Data Cache <strong>and</strong> Buffer Behavior when X = 0C B Cacheable? Bufferable? Write PolicyLineAllocationPolicyNotes0 0 N N - - Stall until complete a0 1 N Y - -1 0 Y Y Write Through Read Allocate1 1 Y Y Write Back Read Allocatea. Normally, the processor continues executing after a data access when no dependency on that access is encountered. Withthis setting, the processor stalls execution until the data access completes. This guarantees to software that the data accesshas taken effect by the time execution of the data access instruction completes. External data aborts from such accessesare imprecise.Table 5. Data Cache <strong>and</strong> Buffer Behavior when X = 1C B Cacheable? Bufferable? Write PolicyLineAllocationPolicyNotes0 0 - - - - Unpredictable -- do not use0 1 N Y - -1 0(Mini DataCache)- - -1 1 Y Y Write BackRead/WriteAllocateWrites do not coalesce intobuffers aCache policy is determinedby MD field of AuxiliaryControl registera. Normally, bufferable writes can coalesce with previously buffered data in the same address range20 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Optimization of Descriptor Processing Software5.0 Optimization of Descriptor Processing SoftwarePerformance benefits can be gained by tuning the software performing descriptor processing.Central to the tuning process is using the feature of the <strong>Intel</strong> ® XScale microarchitecture thatallows separate Data Cache policies for individual memory pages. This allows the data cachepolicy for descriptor processing to be customized for runtime optimization.When tuning custom applications keep in mind:• In performance experiments, better performance was achieved by prebuilding linked listverses using the append resume function. Constraints of custom applications may or may notallow the prebuilding of lists.• When append is used, the uncached unbuffered memory mapping for the last descriptor is usedto append the next descriptor. This eliminates a 16 byte cache flush (since there are two dirtybits for each cache line each for 16 bytes).• Descriptor processing is open to software optimization while <strong>DMA</strong>/<strong>XOR</strong> data transfer timesare the same for all approaches. Also, the <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> engine run concurrently with the<strong>Intel</strong> ® XScale core.• When multiple data cache policies are comparable in runtime performance, choosing thepolicy that has data caches OFF is preferable, since uncached memory regions do not allocatecachelines <strong>and</strong> impact costly cacheline evictions to ram.• GNUPro Toolset Compiler optimization Level -O2 is best for performance in the performancetest cases.• For all performance cases, increasing the descriptor byte count field does not alter the Policyproviding the best performance.• In a custom application, when caches are ON for a memory region <strong>and</strong> the preload can be thatplaced in a compute bound section of code, preload provides significant benefit.• Custom applications performing GW80314 <strong>DMA</strong>/<strong>XOR</strong> descriptor processing may showdifferent comparative results, when different concurrent applications produce different DataBus <strong>and</strong> Data Cache interactions. Examples:— High usage of data cache by application could result in cacheline evictions. In this caseuncached descriptors may show better performance.— High bus traffic by a concurrent application may alter the comparative performance ofcached Policies— Interleaving of data bus transactions while building descriptors in RAM may reduceopportunities for coalescing.• Typically data regions are set as Policy ‘000’. This eliminates synchronizing the source <strong>and</strong>destination data with the Data Cache.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 21


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong>6.0 <strong>Library</strong>6.1 Ecosystem: Application <strong>and</strong> <strong>XOR</strong>/<strong>DMA</strong> <strong>Library</strong> <strong>APIs</strong><strong>APIs</strong> provide the following hardware <strong>and</strong> software interfaces (Also see Appendix A, “<strong>Library</strong>Flow Charts”).GW80314 Hardware Interfaced:• <strong>Intel</strong> ® XScale microarchitecture data cache.• <strong>Intel</strong> ® XScale microarchitecture Page Tables.• <strong>DMA</strong>/<strong>XOR</strong> Controller Unit.• MPIC.<strong>Library</strong> Software Grouped by Category:• Initialization— Hardware— Descriptor Management Data Structures• Operations— <strong>DMA</strong>/<strong>XOR</strong> Descriptor Management: requests, descriptor reclamation— <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> transaction initialization (posting)— Interrupt h<strong>and</strong>ling• Termination— Free Memory22 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong>6.1.1 Flow Sequence Description: <strong>XOR</strong> <strong>and</strong> <strong>DMA</strong> <strong>Library</strong>6.1.1.1 Initializationlib_mem_map_malloc()lib_set_xcb_mem_range()lib_new_mgr()lib_init()callintH<strong>and</strong>lerAttach()mpic_setup()Allocates separate memory regions for <strong>XOR</strong> descriptors, <strong>DMA</strong>descriptors <strong>and</strong> Data. Records memory map to data structureMemmap_Typemem_map;Sets xcb bits for address range. This is used in conjunction with globalvariable mem_map to set descriptor <strong>and</strong> data address ranges.Allocates memory for XorDma_80314_Type data structure.Initialize XorDma_80314_Type data structure. The Free Stacks<strong>and</strong> Post Queues are initialized.Chains interrupt h<strong>and</strong>lers lib_irq_h<strong>and</strong>ler() <strong>and</strong> lib_fiq_h<strong>and</strong>ler() intointerrupt vectors.Called to unmask/mask <strong>and</strong> route interrupts for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong>.6.1.1.2 Requestlib_stack_pop() is called to get a aligned buffer from the Free Stack. The alignment allows thesame buffer to be used as a <strong>DMA</strong> or <strong>XOR</strong> descriptor.In the case of an append, the descriptor values are written. Then, when the cache policy for thememory region used by the descriptors is either 010, 011 or 111 (caches on), the descriptor needsto be flushed to RAM using a macro. Function lib_postq_append_resume_sram() orlib_postq_append_resume_sdram() are called to append the descriptor to the channel descriptord chain, place in post queue, records to global variable chainTail<strong>XOR</strong><strong>DMA</strong>[], then sets resume tostart the transaction.6.1.1.3 Post Transaction CleanupAt some point descriptors for completed transactions must be reclaimed from the Post Queues <strong>and</strong>returned to the Free Stack (when append is used). This is accomplished by calling lib_reclaim().<strong>APIs</strong> <strong>and</strong> Testbench White Paper 23


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong>6.1.1.4 TerminateWhen choosing to terminate the <strong>Library</strong>, then call intH<strong>and</strong>lerDetach() to return the interruptvectors to the pre-chained in state, call mpic_mask_dmaxor_intr() to mask <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong>interrupts <strong>and</strong> call lib_free_mgr() to free memory.Figure 2.<strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> Flow Diagram80314 <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong>HardwareApplication Descriptor Mgt XScale:Cache/Pg Tables <strong>XOR</strong>/<strong>DMA</strong> Controller Interrupt ControllerInitializelib_memmap_malloc()callintH<strong>and</strong>lerAttach()<strong>and</strong> mpic_setup()lib_new_mgr()lib_init()lib_set_xcb_mem_range()Requestlib_stack_pop()Completedescriptor fieldsClean_2_D_Cache_Line() if cache policy 010, 011, 111lib_postq_appnd_resume_sram() or lib_postq_appnd_resume_sdram()Post Transaction CleanupIdentify time to reclaim executeddescriptors for stacklib_reclaim()IRQ_h<strong>and</strong>ler()TerminateintH<strong>and</strong>lerDetach()mpic_mask_dmaxor_intr()lib_f ree_mgr()24 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong>6.1.2 Redboot* <strong>Intel</strong> ® IQGW80314 SV Board Memory MapTable 6.System Physical Memory MapPhysical Addr Size (MB) Description0x00000000 1 GB SDRAM0x40000000 256FLASH / Peripheral Bus (seeTable 3-2)0x50000000 1 SRAM0x50100000 64KB 80314 Control Registers0x50110000 960KB 80314 Control Registers0x50600000 253 No Access0x80000000 495 PCI1 MEM320x9EFF0000 1 PCI1 I/O0x9F000000 16 PCI1 CFG0xA0000000 256 PCI1 PFM10xB0000000 256 PCI1 PFM20xC0000000 495 PCI2 MEM320xDEFF0000 1 PCI2 I/O0xDF000000 16 PCI2 CFG0xE0000000 256 PCI2 PFM10xF0000000 256 PCI2 PFM2Also see Appendix H, “Redboot Memory Map”.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 25


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> <strong>and</strong> Test Bench File Organization <strong>and</strong> Compilation7.0 <strong>Library</strong> <strong>and</strong> Test Bench File Organization<strong>and</strong> CompilationTestbench runs with Redhat* monitor. When another OS is used, modifications may be majordepending on the system architecture (i.e., Linux*).7.1 Folder <strong>and</strong> File OrganizationFolders:7.1.1 / Files:7.1.2 /Lib Files:• / .....................Directory to initiate build.• /Lib................The 80314 <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong>.• /Bench..........Test bench illustrating an implementation of <strong>Library</strong>.• b.bat— Call this from DOS prompt to build bench_314.elf• run314.bat— initiates GNUPro GUI interface• lib_headers.h:— all #include files for <strong>Library</strong>.• 80200.h:— 80200 macros per <strong>Intel</strong> ® XScale Microarchitecture Programmer’s Reference Manual.• <strong>DMA</strong><strong>XOR</strong>80314.h:— 80314 <strong>DMA</strong>: memory map, macros <strong>and</strong> global variables for append <strong>and</strong> reclaim.• dmaxor_desc_mgr.h, dmaxor_desc_mgr.c:— Functions, data structures <strong>and</strong> macros to malloc memory for data <strong>and</strong> descriptorprocessing <strong>and</strong> sets cache policy for memory map of both descriptors <strong>and</strong> data.• i80315.h, MPIC.h, MPIC.c, interrupt_chain.h, interrupt_chain.c:— Interrupt controller/interrupt h<strong>and</strong>ler setup.• xscale.h, xscale.c:— Setup memory map with <strong>Intel</strong> ® XScale microarchitecture xcb Data Cache policy, setglobal coalescing, global data cache clean.• Makefile:— Creates: *.o files used in /Bench.26 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> <strong>and</strong> Test Bench File Organization <strong>and</strong> Compilation7.1.3 /Bench Files:• bench.h, bench.c:— Provides menu to call test cases <strong>and</strong> present output as stdio.• lib_demo_cases.c:— Demonstration of <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong>.• crc32.h, crc32.c:— Used by lib_demo_cases() to validate in software crc calculated by <strong>DMA</strong> engine.• headers.h:— Combine all #include references.• makefile:— Creates output bench_314.elf from object files.7.2 Instruction to Build <strong>and</strong> Run• Unzip into directory.• Setup GNUPro Toolset in directory path.• Open DOS window <strong>and</strong> go to directory above /Lib <strong>and</strong> /Bench.• Type b at comm<strong>and</strong> prompt to call b.bat file:— b.bat file erases existing .o <strong>and</strong> .elf files, build required .o files <strong>and</strong> bench_314.elf.• To run call run314 to call run314.bat file.• When debugger comes up:— Select: File>Target setting, enter baud rate 115200, COM port, target>Ok.— Open console window.— Select: Run>Download > Continue.— Click mouse in Console window.— Enter menu option > .<strong>APIs</strong> <strong>and</strong> Testbench White Paper 27


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>General Notes8.0 General Notes8.1 Cache Implications Using Append Resume MacrosFor a <strong>DMA</strong> or <strong>XOR</strong> descriptor, when the data cache policy requires a cache clean below are thenumber of cachelines required to be cleaned:• <strong>DMA</strong> 2 cache lines• <strong>XOR</strong> 4 source 2 cache lines• <strong>XOR</strong> 8 source 3 cache lines• <strong>XOR</strong> 16 source 5 cache lines8.2 GCSR is used to set Descriptor Completion InterruptsThe gscr is the Channel General Control <strong>and</strong> Status Register <strong>and</strong> is used to set interrupt completioninterrupts. Note this is a channel global register verses a descriptor specific flag. So the channelchain state must be determined when doing appends <strong>and</strong> using interrupts to identify end of chain.8.3 Using ResumeWhen the GCSR OP_CMD remains the same for a channel, the following sequence can be used toappend <strong>and</strong> execute the next descriptor without regard to the state of execution for the existingchain (whether the existing chain is currently executing or complete).......Append goes here.*GCSR_REGISTER |=(RESUME|GO);......8.4 Changing from <strong>DMA</strong> to <strong>XOR</strong> Descriptors on the SameChannelWhen changing the the GCSR OP_CMD, the channel should be INACTIVE <strong>and</strong> the existingchained descriptors should be complete. This is required since the GCSR is used to specify theoperation comm<strong>and</strong> (OP_CMD).8.5 Error H<strong>and</strong>lingIf a descriptor generates an error condition, the <strong>DMA</strong>/<strong>XOR</strong> engine will stop during the executionof that descriptor.28 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>General NotesBy setting the interrupt enable for error conditions, the interrupt h<strong>and</strong>ler can record the stateinformation of the <strong>DMA</strong>/<strong>XOR</strong> engine registers. The register information includes the address ofthe subsequent descriptor to the descriptor causing the error condition (Channel Next DescriptorAddress Register). Determine the descriptor address of the descriptor causing the interrupt <strong>and</strong>invoke the application error h<strong>and</strong>ling routine.8.6 <strong>DMA</strong>/<strong>XOR</strong> Channel Arbitration for the SFN PortThe four channels use round robin to arbitrate for the port. The port can execute a read <strong>and</strong> write todifferent destinations ports concurrently. Each Channel can have 2 reads <strong>and</strong> 2 writes outst<strong>and</strong>ing.Each read or write can be up to the size of the 256 byte buffers.8.7 GNUPro Toolset Compiler OptimizationThe GNUPro toolset offers compiler setting of -O0 through -O3. Each level has its uniquecharacteristic in terms of level of optimization <strong>and</strong> object size. Level -O2 was found to provide thebest performance for descriptor processing. Level -O0 is the default.8.8 Reclamation of DescriptorsThere are alternate methods to select the time when to do reclamation of executed descriptors.These include:• Wait until a request for a free descriptor returns a null pointer indicating there are no freedescriptors remaining in that channel.• Wait till slack time in processing <strong>and</strong> call lib_reclaim().• Set timer <strong>and</strong> call in intervals.8.9 Extending <strong>Library</strong> to Run with Multiple ThreadsThe <strong>Library</strong> is single threaded. However, all functions take a pointer to the data structure<strong>XOR</strong>Dma_GW80314_Type. This feature simplifies porting to multiple threads applications. Theissue becomes mapping the <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> controllers to multiple threads.8.10 When Using Append / Resume SequenceAPPEND(LAST,NEXT,PORT)• LAST should be the uncached/unbuffered address of the last descriptor for that channel.Thiseliminates a cache clean.• NEXT should be the physical address of the descriptor appended.• PORT is the port where the descriptor resides (sdram = 4, sram = 3)• The append followed by the resume operation should only be separated by ”,” operator toeliminate possibility of instruction reordering by compiler.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 29


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>General Notes• Remember if LAST is not mapped to uncached memory address, then Flush 2 cachelines.8.11 Append Operation SequencingWhen doing append, the Channel Next Descriptor Address Register should be the Last Descriptorfield updated since it contains the LAST bit. Therefore, the chain can be in any state of executionbut will not attempt to continue to the next descriptor while the append is incomplete. See macrosequence.#define APPEND(L,N,P)( ((Header_Type*)(L))->nd_tcr=ND_TCR_APPND_VAL(N,P),(((Header_Type *)(L))->nd_addr_m=0x0),(((Header_Type *)(L))->nd_addr_l=((unsigned long)N)))8.12 <strong>DMA</strong> or <strong>XOR</strong> Descriptors Variables should be VolatileBy making variables volatile, this eliminates the potential for compiler optimization altering thevariable. An example would be the global variables used for append:volatile void * chainHead<strong>XOR</strong><strong>DMA</strong>[4];//Global variable used for appendvolatile void * chainTail<strong>XOR</strong><strong>DMA</strong>[4];//Global variable used for appendvolatile XorDma_80314_Type *desc_mgr;30 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Conclusion9.0 ConclusionThe purpose of the GW80314 <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> is to provide a fast ramp for developers byproviding a turnkey optimized solution that synergistically combines with exiting <strong>Intel</strong> collateral(See Appendix J, “Related Documents” for related web documents).<strong>APIs</strong> <strong>and</strong> Testbench White Paper 31


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsAppendix A <strong>Library</strong> Flow ChartsBench80314 <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> Demo Block Diagram<strong>Library</strong>Buffer Manager XScale <strong>DMA</strong>/<strong>XOR</strong> Engine InterruptStart(1) Allocate 4 ChannelQueue/Stack ContainerXorDma_80314_TypeInitialize <strong>Library</strong>(2) Setup memorymap for descriptorprocessing/data(3) Chain ininterrupt h<strong>and</strong>lers<strong>and</strong> unmaskSelect MenuOptionSetup Demo(5) Allocate buffers <strong>and</strong>initializeXorDma_80314_Type(4) Set descriptorprocessing areaDcache Policy(5) Initialize four<strong>DMA</strong>/<strong>XOR</strong>Channels withinitial descriptorInterate throughdescriptors(6) Get descriptor<strong>and</strong> complete ifempty reclaimdescriptors(7) Appendresume <strong>and</strong> setinterrupt(8) Count <strong>DMA</strong>/<strong>XOR</strong> interrupts inh<strong>and</strong>lerValidate Transfer<strong>and</strong> CRC32End32 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsBuffer Manager(1) Allocate 4 channel Queue/Stack Container XorDma_80314_Typelib_new_mgr()startRequest memory for data structuremaintaining all queue <strong>and</strong> stack state data<strong>and</strong> pointers to aligned descriptor/memorybuffers. Callmemalign() withsizeof(XorDma_80314_Type) on 1k bountry.End<strong>APIs</strong> <strong>and</strong> Testbench White Paper 33


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsBuffer Manager(5) Allocate buffers <strong>and</strong> initialize XorDma_80314_Typelib_init()StartFor four channels:1. Set free stacks length <strong>and</strong> limit.2. For Free Stacks , load pointers to Xor_Frame_Typebuffers which can be used for either <strong>DMA</strong> or <strong>XOR</strong>transactions.3. Memory buffer passed for descriptor buffers canlocated in SDRAM or SRAM.Test that allbuffers fall withinupper <strong>and</strong> lowermemory maprangesEach of (4)channels:Initialize circularqueues as emptyExecute one <strong>DMA</strong>descriptor for each channel<strong>and</strong> post address to globalvariable to enable followonappendsEnd34 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsBuffer Manager(6) Get descriptor <strong>and</strong> complete, if empty call reclaimRunning adescriptorCallingReclaimStartStartCalllib_stack_pop() toget aligneddescriptor forchannelSpecifiy channelCompletedescriptor, if cacheregion Policy hasDcahe ON, flushto SDRAM/SRAMCall lib_postq_appnd_resume_sdram()to place descriptor in post queue,append to prior descriptor executed<strong>and</strong> set resume to run descriptor torun.Traverse descriptor chain fromchainHead<strong>DMA</strong><strong>XOR</strong>[] tochainTail<strong>DMA</strong><strong>XOR</strong>[]. For each, calllib_q_get() to remove buffer from postqueue <strong>and</strong> call lib_stack_push() toreturn to free_stack.Adjust chainHead<strong>DMA</strong><strong>XOR</strong>[]to reflect descriptor returnedto free stackEndEnd<strong>APIs</strong> <strong>and</strong> Testbench White Paper 35


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsXScale(2) Setup memory for descriptor processing/dataStartAllocate 1kmemory region forflush of datacache.Call memalign() toallocate memoryfor SDRAM <strong>and</strong>record SRAM tomemory map.Load memory map values to globalvariableMemmap_Type mem_mapwhich contains upper <strong>and</strong> lowerbounds for each memory range.End36 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsXScale(4) Set descriptor processing area Dcache Policylib_set_xcb_mem_range()StartSet Dcache Policies for memory pagesby callinglib_set_page_xcb(). Requires pages beof size sectionEnd<strong>APIs</strong> <strong>and</strong> Testbench White Paper 37


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow Charts<strong>DMA</strong>/<strong>XOR</strong> Engine(7) Append resume <strong>and</strong> set interruptlib_postq_appnd_resume_sdram()StartPlace buffer in Post Queueusing lib_q_put()Get LAST descriptor executed onchannel from global variablechainTail<strong>DMA</strong><strong>XOR</strong>[]. Then post NEXTdescriptor to chainTail<strong>DMA</strong><strong>XOR</strong>[]after converting descriptor address touncached/unbuffered addressmappingCall macroAPPEND_RESUME() whichappends the currentdescriptor to prior descriptor<strong>and</strong> sets resume in GCSREnd38 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsInterrupt(3) Chain in interrupt h<strong>and</strong>lers <strong>and</strong> unmaskStartCall mpic_setup()(1) disable irq/fiq(2) call mpic_init() to initializeinterrupt controller(3) attach fiq/irq interrupth<strong>and</strong>lersEnd<strong>APIs</strong> <strong>and</strong> Testbench White Paper 39


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Flow ChartsInterrupt(8) Count interrupts in h<strong>and</strong>lerStartInside interrupth<strong>and</strong>ler, get irqinterrupt numberWhen interruptnumber == 18increment counterClear interrupt bywriting EOI2 = 0trueAnotherinterruptpresent?falseEnd40 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Data StructuresAppendix B <strong>Library</strong> Data StructuresB.1 Descriptor Headers//Same for <strong>DMA</strong> or <strong>XOR</strong>typedef struct header{unsigned long src_addr_l; // Cache line 1unsigned long src_addr_m; // Cache line 1unsigned long dst_addr_l; // Cache line 1unsigned long dst_addr_m; // Cache line 1unsigned long tcr1; // Cache line 1unsigned long tcr2; // Cache line 1unsigned long nd_addr_l; // Cache line 1unsigned long nd_addr_m; // Cache line 1unsigned long nd_tcr; // Cache line 2}Header_Type;B.2 <strong>DMA</strong> Descriptors//Align on 64 bytes. Size is 44 bytes.typedef struct dma {unsigned longsrc_addr_l;unsigned longsrc_addr_m;unsigned longdst_addr_l;unsigned longdst_addr_m;unsigned longtcr1;unsigned longtcr2;unsigned longnd_addr_l;unsigned longnd_addr_m;unsigned longnd_tcr;unsigned longcrc_addr_l;unsigned longcrc_addr_m;}Dma_Type;B.3 <strong>XOR</strong> Descriptors//Align on 256 byte boundary. Size is 156 bytes.typedef struct xor{unsigned longsrc_addr_l;unsigned longsrc_addr_m;unsigned longdst_addr_l;unsigned longdst_addr_m;unsigned longtcr1;unsigned longtcr2;unsigned longnd_addr_l;unsigned longnd_addr_m;unsigned longnd_tcr;unsigned longsrc01_addr_l;unsigned longsrc01_addr_m;unsigned longsrc02_addr_l;<strong>APIs</strong> <strong>and</strong> Testbench White Paper 41


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Data Structuresunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned longunsigned long}Xor_Type;src02_addr_m;src03_addr_l;src03_addr_m;src04_addr_l;src04_addr_m;src05_addr_l;src05_addr_m;src06_addr_l;src06_addr_m;src07_addr_l;src07_addr_m;src08_addr_l;src08_addr_m;src09_addr_l;src09_addr_m;src10_addr_l;src10_addr_m;src11_addr_l;src11_addr_m;src12_addr_l;src12_addr_m;src13_addr_l;src13_addr_m;src14_addr_l;src14_addr_m;src15_addr_l;src15_addr_m;42 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Data StructuresB.4 <strong>Intel</strong> ® XScale Microarchitecture Page Tables<strong>and</strong> <strong>Library</strong> Memory Map (xscale.h)//Memory map recordedtypedef struct memmap{//Memory map virtual addresses//Descriptorsunsigned long<strong>XOR</strong>_desc_lower_va; //va = virtual addressunsigned long<strong>XOR</strong>_desc_upper_va;unsigned long<strong>XOR</strong>_desc_num_pages;unsigned long<strong>XOR</strong>_desc_xcb;unsigned longdma_desc_lower_va;unsigned longdma_desc_upper_va;unsigned longdma_desc_num_pages;unsigned longdma_desc_xcb;//Data regionunsigned longdata_lower_va;unsigned longdata_upper_va;unsigned longdata_num_pages;unsigned longdata_xcb;unsigned longlad;unsigned longpad;intpage_size;unsigned longpage_boundry_1st;//memory mallocedvoid * toFree;intsize_malloced;}Memmap_Type;//Information returned for a memory pagetypedefstruct page{//Level 1unsigned long pt_base;unsigned long virtadd;unsigned int type_lvl1;unsigned int type_lvl2;unsigned long *lvl1_des_ptr;unsigned long lvl1_des_val;unsigned int xcb_lvl1_before;unsigned int xcb_lvl1_after;//Level 2unsigned long *lvl2_des_ptr;unsigned long lvl2_des_val;unsigned long baseloc;intinput_p;intinput_x;intinput_c;intinput_b;unsigned int xcb_input;unsigned int xcb_lvl2_before;unsigned int xcb_lvl2_after;intpage_size;unsigned int page_xcb;}Page_Type;<strong>APIs</strong> <strong>and</strong> Testbench White Paper 43


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.hAppendix C <strong>DMA</strong><strong>XOR</strong>80314.h/******************************************************************************** Copyright (c) 2003 <strong>Intel</strong> Corporation. All rights reserved.** <strong>Intel</strong> hereby grants you permission to copy, modify, <strong>and</strong> distribute this* software <strong>and</strong> its documentation. <strong>Intel</strong> grants this permission provided* that the above copyright notice appears in all copies <strong>and</strong> that both the* copyright notice <strong>and</strong> this permission notice appear in supporting* documentation. In addition, <strong>Intel</strong> grants this permission provided that* you prominently mark as not part of the original any modifications made* to this software or documentation, <strong>and</strong> that the name of <strong>Intel</strong>* Corporation not be used in advertising or publicity pertaining to the* software or the documentation without specific, written prior* permission.** <strong>Intel</strong> provides this AS IS, WITHOUT ANY WARRANTY, INCLUDING THE WARRANTY* OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, <strong>and</strong> makes no* guarantee or representations regarding the use of, or the results of the* use of, the software <strong>and</strong> documentation in terms of correctness,* accuracy, reliability, currentness, or otherwise, <strong>and</strong> you rely on the* software, documentation, <strong>and</strong> results solely at your own risk.***************************************************************************//****************************************************************************Board: 80314*History:* 08Aug03 LGS Initial Release Larry Stewart larry.g.stewart@intel.com****************************************************************************/#ifndef _<strong>DMA</strong><strong>XOR</strong>80314_H#define _<strong>DMA</strong><strong>XOR</strong>80314_H/************************************************************44 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h* Redboot Defined************************************************************/#define BASE_CREGS#define RAM_LOWER#define RAM_UPPER0x501000000x000000000x10000000/************************************************************* 80314 <strong>DMA</strong>/<strong>XOR</strong> Block************************************************************/#define <strong>DMA</strong><strong>XOR</strong>_BASE (BASE_CREGS + 0x5000)#define CH0_OFFSET#define CH1_OFFSET#define CH2_OFFSET#define CH3_OFFSET0x0000x1000x2000x300enum CHANNEL{CH0 = 0,CH1 = 1,CH2 = 2,CH3 = 3};#define NUM_CHANNELS0x4//Channel 0#define CH0_SRC_ADDR_M (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x000)#define CH0_SRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x004)#define CH0_DST_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x008)#define CH0_DST_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x00c)#define CH0_TCR1 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x010)#define CH0_TCR2 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x014)#define CH0_ND_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x018)#define CH0_ND_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x01c)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 45


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH0_ND_TCR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x020)#define CH0_GCSR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x024)#define CH0_CRC_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x028)#define CH0_CRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x02c)#define CH0_CRC (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET + 0x030)#define CH0_SRC01_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x040)#define CH0_SRC01_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x044)#define CH0_SRC02_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x048)#define CH0_SRC02_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x04c)#define CH0_SRC03_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x050)#define CH0_SRC03_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x054)#define CH0_SRC04_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x058)#define CH0_SRC04_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x05c)#define CH0_SRC05_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x060)#define CH0_SRC05_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x064)#define CH0_SRC06_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x068)#define CH0_SRC06_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x06c)#define CH0_SRC07_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x070)#define CH0_SRC07_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x074)#define CH0_SRC08_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x078)#define CH0_SRC08_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x07c)#define CH0_SRC09_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x080)#define CH0_SRC09_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x084)#define CH0_SRC10_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x088)46 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH0_SRC10_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x08c)#define CH0_SRC11_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x090)#define CH0_SRC11_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x094)#define CH0_SRC12_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x098)#define CH0_SRC12_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x09c)#define CH0_SRC13_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0a0)#define CH0_SRC13_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0a4)#define CH0_SRC14_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0a8)#define CH0_SRC14_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0ac)#define CH0_SRC15_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0b0)#define CH0_SRC15_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH0_OFFSET +0x0b4)//Channel 1#define CH1_SRC_ADDR_M (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x000)#define CH1_SRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x004)#define CH1_DST_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x008)#define CH1_DST_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x00c)#define CH1_TCR1 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x010)#define CH1_TCR2 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x014)#define CH1_ND_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x018)#define CH1_ND_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x01c)#define CH1_ND_TCR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x020)#define CH1_GCSR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x024)#define CH1_CRC_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x028)#define CH1_CRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x02c)#define CH1_CRC (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET + 0x030)#define CH1_SRC01_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x040)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 47


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH1_SRC01_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x044)#define CH1_SRC02_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x048)#define CH1_SRC02_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x04c)#define CH1_SRC03_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x050)#define CH1_SRC03_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x054)#define CH1_SRC04_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x058)#define CH1_SRC04_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x05c)#define CH1_SRC05_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x060)#define CH1_SRC05_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x064)#define CH1_SRC06_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x068)#define CH1_SRC06_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x06c)#define CH1_SRC07_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x070)#define CH1_SRC07_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x074)#define CH1_SRC08_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x078)#define CH1_SRC08_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x07c)#define CH1_SRC09_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x080)#define CH1_SRC09_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x084)#define CH1_SRC10_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x088)#define CH1_SRC10_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x08c)#define CH1_SRC11_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x090)#define CH1_SRC11_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x094)#define CH1_SRC12_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x098)#define CH1_SRC12_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +48 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h0x09c)#define CH1_SRC13_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0a0)#define CH1_SRC13_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0a4)#define CH1_SRC14_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0a8)#define CH1_SRC14_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0ac)#define CH1_SRC15_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0b0)#define CH1_SRC15_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH1_OFFSET +0x0b4)//Channel 2#define CH2_SRC_ADDR_M (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x000)#define CH2_SRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x004)#define CH2_DST_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x008)#define CH2_DST_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x00c)#define CH2_TCR1 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x010)#define CH2_TCR2 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x014)#define CH2_ND_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x018)#define CH2_ND_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x01c)#define CH2_ND_TCR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x020)#define CH2_GCSR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x024)#define CH2_CRC_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x028)#define CH2_CRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x02c)#define CH2_CRC (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET + 0x030)#define CH2_SRC01_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x040)#define CH2_SRC01_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x044)#define CH2_SRC02_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x048)#define CH2_SRC02_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x04c)#define CH2_SRC03_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x050)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 49


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH2_SRC03_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x054)#define CH2_SRC04_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x058)#define CH2_SRC04_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x05c)#define CH2_SRC05_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x060)#define CH2_SRC05_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x064)#define CH2_SRC06_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x068)#define CH2_SRC06_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x06c)#define CH2_SRC07_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x070)#define CH2_SRC07_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x074)#define CH2_SRC08_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x078)#define CH2_SRC08_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x07c)#define CH2_SRC09_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x080)#define CH2_SRC09_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x084)#define CH2_SRC10_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x088)#define CH2_SRC10_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x08c)#define CH2_SRC11_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x090)#define CH2_SRC11_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x094)#define CH2_SRC12_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x098)#define CH2_SRC12_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x09c)#define CH2_SRC13_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x0a0)#define CH2_SRC13_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x0a4)#define CH2_SRC14_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x0a8)#define CH2_SRC14_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +50 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h0x0ac)#define CH2_SRC15_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x0b0)#define CH2_SRC15_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH2_OFFSET +0x0b4)//Channel 3#define CH3_SRC_ADDR_M (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x000)#define CH3_SRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x004)#define CH3_DST_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x008)#define CH3_DST_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x00c)#define CH3_TCR1 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x010)#define CH3_TCR2 (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x014)#define CH3_ND_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x018)#define CH3_ND_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x01c)#define CH3_ND_TCR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x020)#define CH3_GCSR (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x024)#define CH3_CRC_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x028)#define CH3_CRC_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x02c)#define CH3_CRC (volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET + 0x030)#define CH3_SRC01_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x040)#define CH3_SRC01_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x044)#define CH3_SRC02_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x048)#define CH3_SRC02_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x04c)#define CH3_SRC03_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x050)#define CH3_SRC03_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x054)#define CH3_SRC04_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x058)#define CH3_SRC04_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x05c)#define CH3_SRC05_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x060)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 51


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH3_SRC05_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x064)#define CH3_SRC06_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x068)#define CH3_SRC06_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x06c)#define CH3_SRC07_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x070)#define CH3_SRC07_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x074)#define CH3_SRC08_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x078)#define CH3_SRC08_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x07c)#define CH3_SRC09_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x080)#define CH3_SRC09_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x084)#define CH3_SRC10_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x088)#define CH3_SRC10_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x08c)#define CH3_SRC11_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x090)#define CH3_SRC11_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x094)#define CH3_SRC12_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x098)#define CH3_SRC12_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x09c)#define CH3_SRC13_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0a0)#define CH3_SRC13_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0a4)#define CH3_SRC14_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0a8)#define CH3_SRC14_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0ac)#define CH3_SRC15_ADDR_M(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0b0)#define CH3_SRC15_ADDR_L(volatile unsigned long *)(<strong>DMA</strong><strong>XOR</strong>_BASE + CH3_OFFSET +0x0b4)//CH == Channel 0 - 352 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define OFFSET(CH)(((unsigned long)CH) * ((unsigned long)0x100))#define CH_SRC_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC_ADDR_M + OFFSET(CH))#define CH_SRC_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC_ADDR_L + OFFSET(CH))#define CH_DST_ADDR_M(CH)(volatile unsigned long *)((unsigned long)CH0_DST_ADDR_M +OFFSET(CH))#define CH_DST_ADDR_L(CH)(volatile unsigned long *)((unsigned long)CH0_DST_ADDR_L +OFFSET(CH))#define CH_TCR1(CH) (volatile unsigned long *)((unsigned long)CH0_TCR1 + OFFSET(CH))#define CH_TCR2(CH) (volatile unsigned long *)((unsigned long)CH0_TCR2 + OFFSET(CH))#define CH_ND_ADDR_M(CH)(volatile unsigned long *)((unsigned long)CH0_ND_ADDR_M +OFFSET(CH))#define CH_ND_ADDR_L(CH)(volatile unsigned long *)((unsigned long)CH0_ND_ADDR_L +OFFSET(CH))#define CH_ND_TCR(CH)(volatile unsigned long *)((unsigned long)CH0_ND_TCR +OFFSET(CH))#define CH_GCSR(CH) (volatile unsigned long *)((unsigned long)CH0_GCSR + OFFSET(CH))#define CH_CRC_ADDR_M(CH)(volatile unsigned long *)((unsigned long)CH0_CRC_ADDR_M +OFFSET(CH))#define CH_CRC_ADDR_L(CH)(volatile unsigned long *)((unsigned long)CH0_CRC_ADDR_L +OFFSET(CH))#define CH_CRC(CH)(volatile unsigned long *)((unsigned long)CH0_CRC + OFFSET(CH))#define CH_SRC01_ADDR_M(CH) (volatile unsigned long *)((unsignedlong)CH0_SRC01_ADDR_M + OFFSET(CH))#define CH_SRC01_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC01_ADDR_L + OFFSET(CH))#define CH_SRC02_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC02_ADDR_M + OFFSET(CH))#define CH_SRC02_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC02_ADDR_L + OFFSET(CH))#define CH_SRC03_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC03_ADDR_M + OFFSET(CH))#define CH_SRC03_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC03_ADDR_L + OFFSET(CH))#define CH_SRC04_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC04_ADDR_M + OFFSET(CH))#define CH_SRC04_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC04_ADDR_L + OFFSET(CH))#define CH_SRC05_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC05_ADDR_M + OFFSET(CH))#define CH_SRC05_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC05_ADDR_L + OFFSET(CH))<strong>APIs</strong> <strong>and</strong> Testbench White Paper 53


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CH_SRC06_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC06_ADDR_M + OFFSET(CH))#define CH_SRC06_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC06_ADDR_L + OFFSET(CH))#define CH_SRC07_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC07_ADDR_M + OFFSET(CH))#define CH_SRC07_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC07_ADDR_L + OFFSET(CH))#define CH_SRC08_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC08_ADDR_M + OFFSET(CH))#define CH_SRC08_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC08_ADDR_L + OFFSET(CH))#define CH_SRC09_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC09_ADDR_M + OFFSET(CH))#define CH_SRC09_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC09_ADDR_L + OFFSET(CH))#define CH_SRC10_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC10_ADDR_M + OFFSET(CH))#define CH_SRC10_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC10_ADDR_L + OFFSET(CH))#define CH_SRC11_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC11_ADDR_M + OFFSET(CH))#define CH_SRC11_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC11_ADDR_L + OFFSET(CH))#define CH_SRC12_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC12_ADDR_M + OFFSET(CH))#define CH_SRC12_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC12_ADDR_L + OFFSET(CH))#define CH_SRC13_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC13_ADDR_M + OFFSET(CH))#define CH_SRC13_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC13_ADDR_L + OFFSET(CH))#define CH_SRC14_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC14_ADDR_M + OFFSET(CH))#define CH_SRC14_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC14_ADDR_L + OFFSET(CH))#define CH_SRC15_ADDR_M(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC15_ADDR_M + OFFSET(CH))#define CH_SRC15_ADDR_L(CH)(volatile unsigned long *)((unsignedlong)CH0_SRC15_ADDR_L + OFFSET(CH))//Switch Fabric Portenum PORT{HLP = 0,54 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.hPCI_1= 1,PCI_2= 2,CIU_SRAM=3,SDRAM= 4,<strong>DMA</strong><strong>XOR</strong>= 5,GIGE= 6,DIRECT= 7};//TCR1#define TCR1_<strong>XOR</strong>_WR_EN (1


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define TCR2_CRC_ORIENT (1


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define CNDCR_ND_14_D_BLOCKS (0xe


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define GCSR_STOP_EN (1


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h* }GCSR_OP_CMD;* GCSR = set any other bits in GCSR. For example setting interrupts.* L = Last descriptor* N = Next descriptor* P = Port location for appended desc* typedef enum {* HLP = 0,* PCI_1= 1,* PCI_2= 2,* CIU_SRAM=3,* SDRAM= 4,* <strong>DMA</strong><strong>XOR</strong>= 5,* GIGE= 6,* DIRECT= 7* }PORT;* ADDR = Address of descriptor* ND_ADDR_L= Next descriptor address Least significant bits* ND_ADDR_M= Next descriptor address Most significant bits*******************************************************************************///When using uncached memory region for decriptor processing, N must be SDRAM_UCUBmapping#define APPEND_RESUME(CH,CMD,GCSR,L,N,P)(APPEND(L,N,P),SET_GCSR_RESUME(CH,CMD,GCSR))#define APPEND_RESUME_SVLAST(CH,CMD,GCSR,L,N,P)(APPEND_RESUME(CH,CMD,GCSR,L,N,P),LOAD_CHAIN_TAIL(P,CH,N))//Afer APPEND but before Set GCSR, remember to call _Clean_D_Cache_Line for L(prior decriptor)#define APPEND(L,N,P) ( ((Header_Type*)(L))->nd_tcr=ND_TCR_APPND_VAL(N,P),(((Header_Type*)(L))->nd_addr_m=0x0),(((Header_Type *)(L))->nd_addr_l=((unsigned long)N)))#define SET_GCSR_RESUME(CH,CMD,GCSR) (*CH_GCSR(CH)=(unsigned int)((unsignedint)GCSR_RESUME|(unsigned int)CMD|(unsigned int)GCSR|(unsignedint)(GCSR_GO|GCSR_CHAIN)))//Utilites#define SET_DESC_LAST_LL(ADDR)(((Header_Type *)(ADDR))->nd_addr_l |= CNDAR_LAST)#define LOAD_CHAINTAIL(CH,N)(chainTail<strong>DMA</strong><strong>XOR</strong>[CH] = (void *)N)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 59


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.h#define LOAD_CHAIN_TAIL(P,CH,N)(chainTail<strong>DMA</strong><strong>XOR</strong>[CH] = (void *)UCUB(P,N))#define CLEAR_GCSR_STATUS(CH) (*CH_GCSR(CH)=GCSR_STATUS_BITS)//Init Linked List with first descriptor//CH = Channel, M = Most signifcant address, L = Least sig address, NT = nd_tcr#define LOAD_HEAD_CLR_GCSR(CH,M,L,NT)(LOAD_ND_ADDR_REG(CH,L,M),LOAD_ND_TCR_REG(CH,NT),SET_TCR2_REG_BC_0(CH),CLEAR_GCSR_STATUS(CH))#define SET_INTERRUPT(CH)(*CH_GCSR(CH)|=(unsigned long)(GCSR_DONE_EN))#define SET_GCSR_CHAIN_GO(CH,CMD,GCSR)(*CH_GCSR(CH)|=(unsignedlong)(GCSR_CHAIN|GCSR_GO|CMD|GCSR))#define LOAD_ND_ADDR_REG(CH,ND_ADDR_L,ND_ADDR_M) ((*CH_ND_ADDR_L(CH)=(unsignedlong)(ND_ADDR_L)),(*CH_ND_ADDR_M(CH)=(unsigned long)(ND_ADDR_M)))#define LOAD_ND_TCR_REG(CH,VAL ) (*CH_ND_TCR(CH)= (unsigned long)(VAL ))#define SET_TCR2_REG_BC_0(CH) (*CH_TCR2(CH)&= 0xf8000000)//Check Channel Status#define GET_CH_STATUS(CH)(*CH_GCSR(CH) & GCSR_DACT)#define GET_GCSR(CH)(*CH_GCSR(CH))//Dont use this. Rolled up to larger APPEND macro.#define ND_TCR_APPND_VAL(N,P)( (CNDCR_ND_PORT(P))|CNDCR_ND_BLOCKS( ( (( Header_Type*)(N))->tcr1 ) & 0xf )) // blocks from next//Alignment//<strong>DMA</strong> Descriptors align 64 byte boundries. 2 cache lines.#define <strong>DMA</strong>_ALIGN(VAL)((((unsigned long)VAL) & 0x3f)?((((unsigned long)VAL))+(64 -(((unsigned long)VAL) & 0x3f))):((unsigned long)VAL))//<strong>XOR</strong> Descriptors align 256 bytes. 5 cache lines.#define <strong>XOR</strong>_ALIGN(VAL)((((unsigned long)VAL) & 0xff)?((((unsigned long)VAL))+(256 -(((unsigned long)VAL) & 0xff))):((unsigned long)VAL))//Redboot memory map#define SDRAM_CB(addr) /*Policy 111*/( ((unsigned long)(addr)) & ~0xe0000000)#define SDRAM_UCUB(addr) /*Policy 000*/((((unsigned long)(addr)) & ~0xe0000000) |SDRAM_000_BASE() )#define SDRAM_PHY(addr) ((((unsigned long)(addr)) & ~0xe0000000) |60 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.hSDRAM_PHYS_BASE())#define SDRAM_PHYS_BASE() 0x00000000//Physical SDRAM#define SDRAM_000_BASE()0x60000000//Uncached unbuffered alias for SDRAM#define SDRAM_MEM_SIZE()0x10000000//Board specific#define SRAM_CB(addr)/*Policy 111*/((((unsigned long)(addr)) & ~0xffe00000) |SRAM_PHY_BASE() )#define SRAM_UCUB(addr) /*Policy 000*/((((unsigned long)(addr)) & ~0xffe00000) |SRAM_ALIAS_BASE() )#define SRAM_PHY(addr) ((((unsigned long)(addr)) & ~0xffe00000) |SRAM_PHYS_BASE())#define SRAM_PHYS_BASE()0x50000000#define SRAM_ALIAS_BASE()0x50200000#define SRAM_MEM_SIZE()(1024 * 1024) // 1Mbyte//P either enum PORT SDRAM or CIU_SRAM, A address#define CB(P,A) /*Policy 111*/((P == SDRAM)?(SDRAM_CB(A)):(SRAM_CB(A)))#define UCUB(P,A) /*Policy 000*/((P == SDRAM)?(SDRAM_UCUB(A)):(SRAM_UCUB(A)))#define PHY(P,A)#define MEM_SIZE(P)((P == SDRAM)?(SDRAM_PHY(A)):(SRAM_PHY(A)))((P == SDRAM)?(SDRAM_MEM_SIZE()):(SRAM_MEM_SIZE()))//Same for <strong>DMA</strong> or <strong>XOR</strong>typedef struct header{unsigned long src_addr_l;unsigned long src_addr_m;unsigned long dst_addr_l;unsigned long dst_addr_m;unsigned long tcr1;unsigned long tcr2;unsigned long nd_addr_l;unsigned long nd_addr_m;unsigned long nd_tcr;//Second cacheline}Header_Type;//Align on 64 bytes. Size is 44 bytes.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 61


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.htypedef struct dma {unsigned long src_addr_l;unsigned long src_addr_m;unsigned long dst_addr_l;unsigned long dst_addr_m;unsigned long tcr1;unsigned long tcr2;unsigned long nd_addr_l;unsigned long nd_addr_m; //cacheline 1unsigned long nd_tcr; //cacheline 2unsigned longcrc_addr_l;unsigned longcrc_addr_m;}Dma_Type;//Align on 256 byte boundry. Size is 156 bytes.typedef struct xor{unsigned longsrc_addr_l;unsigned longsrc_addr_m;unsigned longdst_addr_l;unsigned longdst_addr_m;unsigned longtcr1;unsigned longtcr2;unsigned longnd_addr_l;unsigned longnd_addr_m; //cacheline 1unsigned longnd_tcr; //cacheline 2unsigned longsrc01_addr_l;unsigned longsrc01_addr_m;unsigned longsrc02_addr_l;unsigned longsrc02_addr_m;unsigned longsrc03_addr_l;unsigned longsrc03_addr_m;unsigned longsrc04_addr_l;unsigned longsrc04_addr_m;unsigned longsrc05_addr_l;62 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>DMA</strong><strong>XOR</strong>80314.hunsigned longsrc05_addr_m;unsigned longsrc06_addr_l;unsigned longsrc06_addr_m;unsigned longsrc07_addr_l;unsigned longsrc07_addr_m;unsigned longsrc08_addr_l;unsigned longsrc08_addr_m;unsigned longsrc09_addr_l;unsigned longsrc09_addr_m;unsigned longsrc10_addr_l;unsigned longsrc10_addr_m;unsigned longsrc11_addr_l;unsigned longsrc11_addr_m;unsigned longsrc12_addr_l;unsigned longsrc12_addr_m;unsigned longsrc13_addr_l;unsigned longsrc13_addr_m;unsigned longsrc14_addr_l;unsigned longsrc14_addr_m;unsigned longsrc15_addr_l;unsigned longsrc15_addr_m;}Xor_Type;#endif<strong>APIs</strong> <strong>and</strong> Testbench White Paper 63


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>dmaxor_desc_mgr.hAppendix D dmaxor_desc_mgr.h/******************************************************************************** Copyright (c) 2003 <strong>Intel</strong> Corporation. All rights reserved.** <strong>Intel</strong> hereby grants you permission to copy, modify, <strong>and</strong> distribute this* software <strong>and</strong> its documentation. <strong>Intel</strong> grants this permission provided* that the above copyright notice appears in all copies <strong>and</strong> that both the* copyright notice <strong>and</strong> this permission notice appear in supporting* documentation. In addition, <strong>Intel</strong> grants this permission provided that* you prominently mark as not part of the original any modifications made* to this software or documentation, <strong>and</strong> that the name of <strong>Intel</strong>* Corporation not be used in advertising or publicity pertaining to the* software or the documentation without specific, written prior* permission.** <strong>Intel</strong> provides this AS IS, WITHOUT ANY WARRANTY, INCLUDING THE WARRANTY* OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, <strong>and</strong> makes no* guarantee or representations regarding the use of, or the results of the* use of, the software <strong>and</strong> documentation in terms of correctness,* accuracy, reliability, currentness, or otherwise, <strong>and</strong> you rely on the* software, documentation, <strong>and</strong> results solely at your own risk.***************************************************************************//****************************************************************************Board: 80314*History:* 08Aug03 LGS Initial Release Larry Stewart larry.g.stewart@intel.com****************************************************************************/#ifndef _<strong>DMA</strong><strong>XOR</strong>_DESC_MGR_H#define _<strong>DMA</strong><strong>XOR</strong>_DESC_MGR_H/*-----------------------------------------------------------------------------64 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>dmaxor_desc_mgr.h**** L I T E R A L S****---------------------------------------------------------------------------*/// Debug code for descriptor manager//#defineDESCMGR_DEBUGtypedef intBool;#define STACKSIZE(QUEUESIZE)// User define but must be power of 2#define QUEUESIZE (1024) //User defined Power of 2/*-----------------------------------------------------------------------------**** S T R U C T U R E S / T Y P E D E F S****---------------------------------------------------------------------------*///<strong>DMA</strong> 44 bytes in size, status <strong>and</strong> pads is not required to be flushed to RAM.typedef struct dma_frame{Dma_Type d; /* descriptor */unsigned longpad0[28];/* Pad to line up status.*/short pid; /* pid */short tid; /* tid */short csr; /* CSR value */short mark; /* If != 0 then executed*/unsigned longpad[23];}Dma_Frame_Type;//<strong>XOR</strong> 156 bytes in size, status <strong>and</strong> pads is not required to be flushed to RAMtypedef struct xor_frame{Xor_Type d;short pid; /* pid */short tid; /* tid */<strong>APIs</strong> <strong>and</strong> Testbench White Paper 65


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>dmaxor_desc_mgr.hshort csr; /* CSR value */short mark; /* If != 0 then executed*/unsigned longpad[23];//Now 256 bytes or 64 byte//for 64 word alignment.}Xor_Frame_Type;/* Container designed to be multiples of cacheline (32 bytes).*/typedef struct queue{intintintunsigned longCircQ_Front;CircQ_Length;CircQ_Limit;qty_marked; //Quantity markedunsigned long * pad[4];void * CircQ[QUEUESIZE]; //Channel size s/b mult of 8}Queue_Type;typedef struct stack{intintStack_Length;Stack_Limit;int pad[6]; /* To stay on cacheline. */void* Stack[STACKSIZE];}Stack_Type;/* Structure mapping four Channels to Controller Channels */typedef struct xordma_80314{//ChannelsStack_TypeQueue_TypeFreeStack[NUM_CHANNELS];//4 ChannelsQueue[NUM_CHANNELS];//4 Channels//Put whatever malloced herevoid *toFreedma_80314;}XorDma_80314_Type;/*-----------------------------------------------------------------------------**** M A C R O S66 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>dmaxor_desc_mgr.h****---------------------------------------------------------------------------*/#define LIB_SAVE_LAST_APPEND(CH,ADDR)(chainTail<strong>DMA</strong><strong>XOR</strong>[CH ] =(void *)ADDR)#define LIB_GET_LAST_APPEND(CH)(chainTail<strong>DMA</strong><strong>XOR</strong>[CH ])/*-----------------------------------------------------------------------------**** P R O T O T Y P E S****---------------------------------------------------------------------------*/// Setup <strong>and</strong> wrapupXorDma_80314_Type * lib_new_mgr(void);intintvoidlib_buffersize(void);lib_init(XorDma_80314_Type * mgrt , void * desc_baseaddr);lib_free_mgr(XorDma_80314_Type * mgr);// Stack Operationsvoid *Boolchannel);void *lib_stack_pop(XorDma_80314_Type * mgr,enum CHANNEL channel);lib_stack_push(XorDma_80314_Type * mgr, void * frame,enum CHANNELlib_top_of_stack(XorDma_80314_Type * mgr,enum CHANNEL channel);//Queue Operationsintvoid *intchannel);lib_reclaim(XorDma_80314_Type * mgr, enum CHANNEL channel);lib_q_get(XorDma_80314_Type * mgr, enum CHANNEL channel);lib_q_put(XorDma_80314_Type * mgr, void * frame, enum CHANNEL//Post operationsinline int lib_postq_appnd_resume_sdram(XorDma_80314_Type * mgr, void * frame,enumPORT port,enum GCSR_OP_CMD cmd,unsigned int gcsr,enum CHANNEL channel);inline int lib_postq_appnd_resume_sram (XorDma_80314_Type * mgr, void * frame,enumPORT port,enum GCSR_OP_CMD cmd,unsigned int gcsr,enum CHANNEL channel);//Debugging descriptor chainsvoidprintf<strong>DMA</strong>chain(int channel);<strong>APIs</strong> <strong>and</strong> Testbench White Paper 67


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>dmaxor_desc_mgr.h#endif68 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesAppendix E <strong>Library</strong> Function PrototypesE.1 Functions Included in xscale.h <strong>and</strong> xscale.cE.1.1void lib_flush_data_cache(void)ItemDescriptionPrototype void lib_flush_data_cache(void);InputOutputPurposeOperationNoneNoneCleans <strong>and</strong> invalidates 32 Kb data cache.Selects 32 Kb memory region reserved for this clean operation, allocates cache lines to evictdirty data to ram. Then invalidates full data cache.E.1.2int lib_memmap_malloc(int dma_desc_Mbs,int <strong>XOR</strong>_desc_Mbs, int data_Mbs)ItemPrototypeInputOutputPurposeOperationDescriptionint lib_memmap_malloc(int dma_desc_Mbs, int <strong>XOR</strong>_desc_Mbs,int data_Mbs)dma_desc_Mbs: Enter the number of megabytes to be allocated for <strong>DMA</strong> descriptorsxor_desc_Mbs: Enter the number of megabytes to be allocated for <strong>XOR</strong> descriptorsdata_Mbs: enter the number of megabytes to be entered for data. Transaction source <strong>and</strong>destination data.SUCCESS == 0FAIL == non-zeroAllocates <strong>and</strong> records memory regions on page boundaries for <strong>DMA</strong> descriptors, <strong>XOR</strong>descriptors <strong>and</strong> data area.Function mallocs region of size totaling the number of megabytes requested for <strong>DMA</strong>, <strong>XOR</strong><strong>and</strong> data descriptors plus one page. The memory regions for each are recorded to the globaldata structure Memmap_Type mem_map.E.1.3void lib_set_xcb_mem_range(unsigned int xcb,void * virt_addr_base, int size_in_bytes)ItemDescriptionPrototype void lib_set_xcb_mem_range(unsigned int xcb,void * virt_addr_base, int size_in_bytes);Inputxcb: xcb bit values. Example x=1, c=0, b=1 would be 0x5; While x=1, c=1, b=1 would be 7virt_addr_base: Enter virtual address of start of range in which has the xcb bits set for allpages within the range.size_in_bytes: number of bytes to establish the upper bound for the cache policy change (xcbbits)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 69


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesItemDescriptionOutputPurposeOperationNoneTo set the data cache policy (xcb bits) for the memory range after cleaning <strong>and</strong> invalidatesthe data cache.• Cleans <strong>and</strong> invalidates the full data cache• Interacted through the pages within the address range• For each page sets the data policy per input parameter xcb• Invalidates the data TLBsE.1.4Page_Type lib_get_page_attributes(unsigned long virt_addr)ItemPrototypeInputOutputPurposeOperationDescriptionPage_Type lib_get_page_attributes(unsigned long virt_addr);Virtual memory addressPage_Type that provides: page_size, page_xcb = current cache policy (xcb bits), base_loc =page boundary prior to virt_addrObtain attributes of page that includes the input parameter address. Attributes include pagesize, lower page boundary <strong>and</strong> cache policy.Function calls lib_set_page_xcb with input parameter that does not change the state of thepage tablesE.1.5Page_Type lib_set_page_xcb(unsigned long base,unsigned int xcb)ItemPrototypeInputOutputPurposeOperationDescriptionPage_Type lib_set_page_xcb(unsigned long base, unsigned int xcb);base: address on page in whichxcb: cache policy. Bit 2==x bit, bit 1: c== bit <strong>and</strong> bit 0==b bitPage_Type that provides: page_size, page_xcb = current cache policy (xcb bits), base_loc =page boundary prior to virt_addrSets the cache policy for the page including the input parameter address.• Before calling: data cache should be cleaned <strong>and</strong> invalidated• After calling: Data TLB needs to be invalicated.lFunction sets the xcb bits in the page table using the Translation Process algorithm per theARM Architecture Manual page b3-6 to <strong>and</strong> the bit definitions per the <strong>Intel</strong> ® XScale Microarchitecture Programmer’s Reference Manual.70 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.2 Functions included in dmaxor_desc_mgr.h<strong>and</strong> dmaxor_desc_mgr.cE.2.1XorDma_80314_Type * lib_new_mgr(void)ItemPrototypeInputOutputPurposeOperationXorDma_80314_Type * lib_new_mgr(void);NoneNoneDescriptionAllocates memory <strong>and</strong> instantiates data structureXorDma_80314_Type.Allocate memory <strong>and</strong> 1 Kbyte aligned XorDma_80314_Type. Note thelib_dma_init()is passed memory for the <strong>DMA</strong>/<strong>XOR</strong> descriptor buffers <strong>and</strong> initializes the stack <strong>and</strong> queuedata structures.E.2.2int lib_buffersize(void)ItemDescriptionPrototype int lib_buffersize(void)Input NoneOutput Returns the number of bytes required for <strong>DMA</strong>/<strong>XOR</strong> BuffersPurpose Specifies the number of bytes required for <strong>DMA</strong>/<strong>XOR</strong> buffers including amount for alignment.Operation return(((sizeof(Xor_Frame_Type) * STACKSIZE)+sizeof(Xor_Frame_Type)) * 2)<strong>APIs</strong> <strong>and</strong> Testbench White Paper 71


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.2.3void lib_init(XorDma_80314_Type * mgrt,void * desc_baseaddr)ItemDescriptionPrototype void lib_init(XorDma_80314_Type * mgrt, void * desc_baseaddr);InputOutputPurposeOperationmgrt - pointer to XorDma_80314_Type allocated by lib_new_mgr().desc_baseaddr - Base address of memory region to be used by <strong>DMA</strong>/<strong>XOR</strong> descriptorsNoneTo take memory address allocated for <strong>DMA</strong>/<strong>XOR</strong> buffers <strong>and</strong> initialize stack <strong>and</strong> queues.After this function call <strong>DMA</strong>/<strong>XOR</strong> stacks <strong>and</strong> queues are initialized <strong>and</strong> <strong>DMA</strong>/<strong>XOR</strong> Section ofthe <strong>Library</strong> is read for use.• Initialize <strong>DMA</strong>/<strong>XOR</strong> free stacks• Initialize channel 0, 1, 3 <strong>and</strong> 4 post queues• Execute dummy descriptors for four channels to enable appends. The initializechainHead<strong>DMA</strong><strong>XOR</strong>[channel] <strong>and</strong> chainTail<strong>DMA</strong><strong>XOR</strong>[channel] to allow append/resume.E.2.4void lib_free_mgr(XorDma_80314_Type * mgr)ItemDescriptionPrototype void lib_free_mgr(XorDma_80314_Type * mgr);InputOutputPurposeOperationPointer to XorDma_80314_Type data structure to freeNoneFrees all memory associated with XorDma_80314_Type <strong>and</strong> sets to NULL.Frees all memory associated with XorDma_80314_Type <strong>and</strong> sets to NULL.E.2.5void * lib_stack_pop(XorDma_80314_Type * mgr,enum CHANNEL channel)ItemDescriptionPrototype void * lib_stack_pop(XorDma_80314_Type * mgr, enum CHANNEL channel)InputOutputPurposeOperationXorDma_80314_Type being used)Engine: either Channel 0, 1, 2 or 3Pointer to Frame when successful or NULL when stack emptyTo provide a <strong>DMA</strong> or <strong>XOR</strong> frame from a free stack. When a cacheable memory regions isused for descriptors, using stack increases the likelihood that stack is still in cache.Gets <strong>DMA</strong>/<strong>XOR</strong> frame from the top of the channel specific Free stack <strong>and</strong> changes stackstate to reflect removal.72 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.2.6Bool lib_stack_push(XorDma_80314_Type * mgr,void * frame, enum CHANNEL channel)ItemPrototypeInputOutputPurposeOperationBoolchannel)Descriptionlib_stack_push(XorDma_80314_Type * mgr, void * frame, enum CHANNELmgr: XorDma_80314_Type pointer to data structure being accessed.frame: pointer to frame being returned to the stackchannel: either Channel 0, 1, 2, or 3Success == 0Fail == non-zeroPlace frame on stack for future reuse.Place Channel 0, 1, 2 or 3 frame on top of free stack <strong>and</strong> change stack state to show a newtop of stack.E.2.7void * lib_top_of_stack(XorDma_80314_Type * mgr,enum CHANNEL channel)ItemDescriptionPrototype void * lib_top_of_stack(XorDma_80314_Type * mgr, enum CHANNEL channel)InputOutputPurposeOperationPointer to XorDma_80314_Type being used.Engine: either Channel 0, 1, 2 or 3Pointer to TOS frame or NULL when stack is emptyGet a pointer to top frame on the stack (Either channel 0, 1, 2 <strong>and</strong> 3). This function does notchange the state of the stack. Could be used for preload.See purpose<strong>APIs</strong> <strong>and</strong> Testbench White Paper 73


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.2.8 inline int lib_postq_appnd_resume_sdram(XorDma_80314_Type *mgr, void * frame,enum PORT port,enum GCSR_OP_CMDcmd,unsigned int gcsr,enum CHANNEL channel)ItemPrototypeInputDescriptioninline int lib_postq_appnd_resume_sdram(XorDma_80314_Type * mgr, void *frame,enum PORT port,enum GCSR_OP_CMD cmd,unsigned int gcsr,enum CHANNELchannel)XorDma_80314_Type * mgr: Pointer to XorDma_80314_Type being usedvoid * frame: The frame to be appendedenum PORT port:typedef enum {HLP = 0,PCI_1=1,PCI_2=2,CIU_SRAM=3,SDRAM=4,<strong>DMA</strong><strong>XOR</strong>=5,GIGE=6,DIRECT=7}PORTenum GCSR_OP_CMD cmd:enum{<strong>DMA</strong>_CMD,<strong>XOR</strong>_CMD}GCSR_OP_CMDunsigned int gcsr: value to be loaded to gcsr register to initiate append.enum CHANNEL channel:channel: either Channel 0, 1, 2, or 3OutputPurposeOperationSUCCESS == 0FAIL == non-zeroTo post frame to post queue, append frame to a channel specific chain of <strong>DMA</strong> descriptors<strong>and</strong> set channel resume to initiate transfer.NOTE: When using cached memory, flush descriptor to RAM before calling function.• Post to queue• Append to Channel Chain• Reset chain chainTail<strong>XOR</strong><strong>DMA</strong>[]pointer• Set channel resumeE.2.9int lib_reclaim(XorDma_80314_Type * mgr,enum CHANNEL channel)ItemDescriptionPrototype int lib_reclaim(XorDma_80314_Type * mgr, enum CHANNEL channel)Inputmgr: XorDma_80314_Type data structure usingchannel: either Channel 0, 1, 2, or 374 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesItemDescriptionOutputPurposeOperationthe number of descriptors returnedReturn Frames completed frames to the channel Free Stack• Get chainHead<strong>DMA</strong><strong>XOR</strong>[4] value for first descriptor in chain• Get chainTail<strong>DMA</strong><strong>XOR</strong>[4] for alst descriptor in chain• Traverse <strong>DMA</strong> channel chain from head to tail, removing descriptors from post queue<strong>and</strong> placing them in free queue.• Adjust chainHead<strong>DMA</strong><strong>XOR</strong>[4]• Return number of descriptors returnedE.2.10void * lib_q_get(XorDma_80314_Type * mgr, enum CHANNELchannel)ItemDescriptionPrototype void * lib_q_get(XorDma_80314_Type * mgr, enum CHANNEL channel)InputOutputPurposeOperationmgr: The XorDma_80314_Type usedchannel: either Channel 0, 1, 2, or 3Return a pointer to the Frame removed from the channel queueRetrieve frame on FIFO sequence from queue.• Get Frame using FIFO• Adjust state of queue to reflect removed frame<strong>APIs</strong> <strong>and</strong> Testbench White Paper 75


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.2.11int lib_q_put(XorDma_80314_Type * mgr, void * frame,enum CHANNEL channel)ItemPrototypeInputOutputPurposeOperationDescriptionintlib_q_put(<strong>XOR</strong>Dma_GW80314_Type * mgr, void * frame, enumCHANNEL channel)mgr: XorDma_80314_Type operating onframe: Frame returning to Queuechannel: either Channel 0, 1, 2 or 3SUCCESS == 0FAIL == non-zeroTo post frame to channel queue.• Post to queue• Adjust queue to reflect state change• Return SUCCESS of FAIL76 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.3 Functions Included in chain_interrupt.h <strong>and</strong>chain_interrupt.cE.3.1void intH<strong>and</strong>lerDetach(void)ItemPrototypeInputOutputPurposeOperationvoid intH<strong>and</strong>lerDetach(void);NoneNoneDescriptionTo remove lib_fiq_h<strong>and</strong>ler <strong>and</strong> lib_fiq_h<strong>and</strong>ler h<strong>and</strong>lers from chain. Returns state of interruptvectors to pre-callintH<strong>and</strong>lerAttach() state.Restore pre-callintH<strong>and</strong>lerAttach() vector values.E.3.2void callintH<strong>and</strong>lerAttach(void)ItemPrototypeInputOutputPurposeOperationDescriptionvoid callintH<strong>and</strong>lerAttach(void);NoneNoneChains in lib_fiq_h<strong>and</strong>ler <strong>and</strong> lib_irq_h<strong>and</strong>ler to interrupt vectors.Calls function intH<strong>and</strong>lerAttach(lib_irq_h<strong>and</strong>ler,lib_fiq_h<strong>and</strong>ler).E.3.3void intH<strong>and</strong>lerAttach(void (*irq)(void),void (*fiq)(void))ItemPrototypeInputOutputPurposeOperationDescriptionvoid intH<strong>and</strong>lerAttach(void (*irq)(void),void (*fiq)(void));irq: irq h<strong>and</strong>ler to be chained into interrupt vectorfiq: fiq h<strong>and</strong>ler to be chained into interrupt vectorNoneTo chain functions into interrupt vectors.• Get location pointed to by interrupt vectors• Save contents at location to global variable used to restore state• Record address of function being chained in• Function chained in jump to prior vector it interrupt is not <strong>XOR</strong> or <strong>DMA</strong> related.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 77


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong><strong>Library</strong> Function PrototypesE.3.4void lib_irq_h<strong>and</strong>ler(void)__attribute__ ((__naked__))ItemPrototypeInputOutputPurposeOperationDescriptionvoid lib_irq_h<strong>and</strong>ler(void)__attribute__ ((__naked__));NoneNoneIRQ h<strong>and</strong>ler for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> initiated interrupts. Naked attribute eliminate function prolog<strong>and</strong> epilogTest for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> irq interrupt per IINTSRCE.3.5void lib_fiq_h<strong>and</strong>ler(void)__attribute__ ((__naked__))ItemPrototypeInputOutputPurposeOperationDescriptionvoid lib_fiq_h<strong>and</strong>ler(void)__attribute__ ((__naked__));NoneNoneFIQ h<strong>and</strong>ler for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> error condition interrupts. Naked attribute eliminate functionprolog <strong>and</strong> epilog.Test for <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> fiq interrupt per IINTSRC78 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Testbench: Data StructuresAppendix F Testbench: Data StructuresF.1 bench.h//Global Data structure to record experiment statetypedef struct experiment{enum testtest; //<strong>XOR</strong>_TEST or <strong>DMA</strong>_TEST or UNDEF_TEST//Data memory map. Source <strong>and</strong> destinationintmax_buf_size;unsigned long lad_lower;unsigned long lad_upper;unsigned long pad_lower;unsigned long pad_upper;//Experiment characteristicschar xcb[40]; //xcb bits}Experiment_Type;<strong>APIs</strong> <strong>and</strong> Testbench White Paper 79


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Test Bench <strong>Library</strong> Function PrototypesAppendix G Test Bench <strong>Library</strong> Function PrototypesG.1 bench.cG.1.1int main(void);ItemDescriptionPrototype int main(void);Input NoneOutput NonePurposeOperationG.1.2 void print_title(enum build b)Testbench initialization <strong>and</strong> control of test cases run based on keyboard input from menuselectionInitialization• Allocates descriptor manager data structure• Allocated memory for descriptor processing <strong>and</strong> data <strong>and</strong> records to global structure• Chains in interrupt h<strong>and</strong>lers• Enables <strong>and</strong> routes irq <strong>and</strong> fiq interruptsMenu Selection• Infinite while loop that call test cases based on keyboard input from MenuItemDescriptionPrototype void print_title(enum build b)Input enum build identifies the test case.Output NonePurpose To print the test case title to stdio <strong>and</strong> file bnech.outOperation Based on input parameter C switch statement selects corresponding printf statements.G.1.3void generate_src_dst(void);ItemDescriptionPrototype void generate_src_dst(void);Input NoneOutput NonePurpose Reset state of memory before each test case is run.OperationResets state of descriptor <strong>and</strong> data memory regions to 0. Then writes test data to sourcelocations.80 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Test Bench <strong>Library</strong> Function PrototypesG.2 lib_demo_cases.cG.2.1void lib_demo_sdram(void)void lib_demo_sram(void)ItemPrototypeInputOutputPurposeOperationvoidvoidNoneNonelib_demo_sdram(void)lib_demo_sram(void)DescriptionTo demonstrate functionality of <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> including error detection using interrupth<strong>and</strong>ler. sdram sets up descriptors from sdram while sram sets up descriptors in sram.Demonstrated functionality includes (this is commented <strong>and</strong> h<strong>and</strong>led by main():• Allocate memory for descriptor manager• malloc the memory map• chain in the interrupt h<strong>and</strong>ler• set global coalescingCalled by case:• initialize the <strong>DMA</strong>/<strong>XOR</strong> <strong>Library</strong> data structure inside descriptor manager (setup descriptorFree Stack <strong>and</strong> Post Queues, initialize <strong>DMA</strong>/<strong>XOR</strong> engines for appends)• set descriptor processing memory range as 001• stat data memory range as 000• execute <strong>DMA</strong> descriptor transactions• use append <strong>and</strong> reclaiming descriptors• reclaim reports any <strong>DMA</strong> transfer error since an error causes a interrupt to occur <strong>and</strong> theinterrupt h<strong>and</strong>ler records the csr value to the frame. For an error the csr non-zeros. Thenon-zero csr value is identified by reclaim.<strong>APIs</strong> <strong>and</strong> Testbench White Paper 81


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Redboot Memory MapAppendix H Redboot Memory Map// Virtual Addr Physical Addr Sz(MB) Description Cache D AP// ------------ ------------- ------ ------.----. ---- -------------------// 1 0x00000000 0x00000000 512 SD/DDRAM WB RWA 0 RWE// 2 0x20000000 0x20000000 512 SD/DDRAM window WB RWA 0 RWE// 3 0x40000000 0x40000000 256 FLASH WT RA 1 RE// 4 0x50000000 0x50000000 1 SRAM WB RWA 0 RWE// 5 0x50100000 0x50100000 64KB 80314 Ctrl Reg UnC UnB 2 RW// 6 0x50110000 0x50110000 960KB Part of Ctrl Reg// 6a 0x50200000 0x50000000 1 SRAM alias UnC UnB 0 RW// 7 0x50300000 0x50300000 13 No Access UnC UnB 4 NoA// 8 0x51000000 0x51000000 16 Data Cache Flush WT RA 3 R// 8a 0x52000000 0x52000000 16 Mini-Data Cache WB RWA 0 R// 8b 0x53000000 0x53000000 16 Mini-Data Flush WB RWA 0 R// 9 0x54000000 0x54000000 194 No Access UnC UnB 4 NoA//10 0x60000000 0x00000000 512 SD/DDRAM alias UnC UnB 0 RWE82 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example CodeAppendix I Example Code#define GCSR_STATUS_BITS0x0000f7c0//Status bits are write clear//Remember//- To set descriptor LAST bit in CHx_ND_ADDR_L//- Align <strong>DMA</strong> <strong>and</strong> AAU descriptors//- If using cached memory, flush descriptor from cache to ram//------------------------------------------------------------------//<strong>XOR</strong> Four source direct mode//------------------------------------------------------------------*CH0_SRC_ADDR_M= (unsigned long)0x0;*CH0_SRC_ADDR_L= (unsigned long)src[0]; //Source 0 address*CH0_SRC01_ADDR_M= (unsigned long)0x0;*CH0_SRC01_ADDR_L= (unsigned long)src[1]; //Source 1 address*CH0_SRC02_ADDR_M= (unsigned long)0x0;*CH0_SRC02_ADDR_L= (unsigned long)src[2]; //Source 2 address*CH0_SRC03_ADDR_M= (unsigned long)0x0;*CH0_SRC03_ADDR_L= (unsigned long)src[3]; //Source 3 address*CH0_DST_ADDR_M= (unsigned long)0x0;*CH0_DST_ADDR_L= (unsigned long)dst; //Destination address//Set port for source <strong>and</strong> destination, number of blocks,//direct fill for first block <strong>and</strong> enable write*CH0_TCR1 = (unsignedlong)((SDRAM


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Code//------------------------------------------------------------------desc0->src_addr_l=(unsigned long)src[0];desc0->src_addr_m=0x0;desc0->dst_addr_l=(unsigned long)dst;desc0->dst_addr_m=0x0;desc0->tcr1 = (unsignedlong)((SDRAMnd_tcr= (unsigned long)(0x0);desc0->src01_addr_l=(unsigned long)src[1];desc0->src01_addr_m=0x0;desc0->src02_addr_l=(unsigned long)src[2];desc0->src02_addr_m=0x0;desc0->src03_addr_l=(unsigned long)src[3];desc0->src03_addr_m=0x0;desc0->src04_addr_l=(unsigned long)src[4];desc0->src04_addr_m=0x0;desc0->src05_addr_l=(unsigned long)src[5];desc0->src05_addr_m=0x0;desc0->src06_addr_l=(unsigned long)src[6];desc0->src06_addr_m=0x0;desc0->src07_addr_l=(unsigned long)src[7];desc0->src07_addr_m=0x0;desc0->src08_addr_l=(unsigned long)src[8];desc0->src08_addr_m=0x0;desc0->src09_addr_l=(unsigned long)src[9];desc0->src09_addr_m=0x0;desc0->src10_addr_l=(unsigned long)src[10];desc0->src10_addr_m=0x0;desc0->src11_addr_l=(unsigned long)src[11];desc0->src11_addr_m=0x0;desc0->src12_addr_l=(unsigned long)src[12];84 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Codedesc0->src12_addr_m=0x0;desc0->src13_addr_l=(unsigned long)src[13];desc0->src13_addr_m=0x0;desc0->src14_addr_l=(unsigned long)src[14];desc0->src14_addr_m=0x0;desc0->src15_addr_l=(unsigned long)src[15];desc0->src15_addr_m=0x0;desc1->src_addr_l=(unsigned long)src[16];desc1->src_addr_m=0x0;desc1->dst_addr_l=(unsigned long)dst;desc1->dst_addr_m=0x0;desc1->tcr1 = (unsignedlong)((SDRAMnd_tcr= 0x0;desc1->src01_addr_l=(unsigned long)src[17];desc1->src01_addr_m=0x0;desc1->src02_addr_l=(unsigned long)src[18];desc1->src02_addr_m=0x0;desc1->src03_addr_l=(unsigned long)src[19];desc1->src03_addr_m=0x0;desc1->src04_addr_l=(unsigned long)src[20];desc1->src04_addr_m=0x0;desc1->src05_addr_l=(unsigned long)src[21];desc1->src05_addr_m=0x0;desc1->src06_addr_l=(unsigned long)src[22];desc1->src06_addr_m=0x0;desc1->src07_addr_l=(unsigned long)src[23];desc1->src07_addr_m=0x0;desc1->src08_addr_l=(unsigned long)src[24];desc1->src08_addr_m=0x0;<strong>APIs</strong> <strong>and</strong> Testbench White Paper 85


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Codedesc1->src09_addr_l=(unsigned long)src[25];desc1->src09_addr_m=0x0;desc1->src10_addr_l=(unsigned long)src[26];desc1->src10_addr_m=0x0;desc1->src11_addr_l=(unsigned long)src[27];desc1->src11_addr_m=0x0;desc1->src12_addr_l=(unsigned long)src[28];desc1->src12_addr_m=0x0;desc1->src13_addr_l=(unsigned long)src[29];desc1->src13_addr_m=0x0;desc1->src14_addr_l=(unsigned long)src[30];desc1->src14_addr_m=0x0;desc1->src15_addr_l=(unsigned long)src[31];desc1->src15_addr_m=0x0;//Set descriptor port <strong>and</strong> number of source blocks*CH0_ND_TCR =(unsigned long)((SDRAM


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Code//------------------------------------------------------------------//<strong>DMA</strong> direct mode//------------------------------------------------------------------//Load source <strong>and</strong> destination ports*CH0_TCR1=*CH0_TCR2=(unsigned long)((SDRAMdst_addr_m=(unsigned long)0x0;desc0->tcr1 =desc0->tcr2 =(unsigned long)((SDRAMdst_addr_l=(unsigned long)dst[2];desc1->dst_addr_m=(unsigned long)0x0;desc1->tcr1 =desc1->tcr2 =(unsigned long)((SDRAM


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Codedesc1->nd_addr_l=CNDAR_LAST;desc1->nd_addr_m=0x0;desc1->nd_tcr=(SDRAMsrc_addr_m=(unsigned long)0x0;desc2->dst_addr_l=(unsigned long)dst[3];desc2->dst_addr_m=(unsigned long)0x0;desc2->tcr1 =desc2->tcr2 =(unsigned long)((SDRAMsrc_addr_l=(unsigned long)src;//Source addressdesc0->src_addr_m=(unsigned long)0x0;desc0->dst_addr_l=(unsigned long)dst;//Destination address88 <strong>APIs</strong> <strong>and</strong> Testbench White Paper


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Example Codedesc0->dst_addr_m=(unsigned long)0x0;desc0->tcr1 = (unsignedlong)((SDRAMcrc_addr_m=0x0;*CH0_ND_ADDR_M=0x0;*CH0_ND_ADDR_L=desc0;*CH0_ND_TCR =(unsigned long)(SDRAM


<strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> <strong>XOR</strong> <strong>Library</strong>Related DocumentsAppendix J Related Documents• <strong>Intel</strong> ® 80200 <strong>Processor</strong> based on <strong>Intel</strong> ® XScale Microarchitecture Developer’s Manualhttp://developer.intel.com/design/iio/manuals/273411.htm.• <strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Developer’s Manual (273517)http://developer.intel.com/design/iio/manuals/273517.htm.• <strong>Intel</strong> ® XScale Microarchitecture Programmer’s Reference Manualhttp://developer.intel.com/design/intelxscale/273436.htm.• <strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> Developer’s Manualhttp://developer.intel.com/design/iio/manuals/273517.htm.• Data Access Performance Optimization on the <strong>Intel</strong> ® GW80314 I/O <strong>Processor</strong> White Paperhttp://developer.intel.com/design/iio/papers/273872.htm.• <strong>Intel</strong> ® 80321 I/O <strong>Processor</strong> <strong>DMA</strong> <strong>and</strong> AAU <strong>Library</strong> <strong>APIs</strong> <strong>and</strong> Testbench Code Fileshttp://www.intel.com/design/iio/swsup/<strong>DMA</strong>_AAU_Lib.htm• ARM Architecture Reference Manual, Edited by David Seal, Adison-WesleyMany other Application Notes <strong>and</strong> tools:• http://www.intel.com/design/intelxscale/.90 <strong>APIs</strong> <strong>and</strong> Testbench White Paper

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!