10.07.2015 Views

Intel 80310 I/O Processor Chipset AAU Coding Techniques

Intel 80310 I/O Processor Chipset AAU Coding Techniques

Intel 80310 I/O Processor Chipset AAU Coding Techniques

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong><strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>White PaperJanuary 14, 2002Document Number: 273649-001


ContentsFigures1 Application Accelerator Unit..........................................................................................................82 <strong>AAU</strong> State Trace Diagram ..........................................................................................................163 Interrupt Handler Functional Flow Diagram ................................................................................23Tables1 Acronyms......................................................................................................................................92 <strong>AAU</strong> Control Registers..................................................................................................................93 DC Field Description...................................................................................................................104 <strong>AAU</strong> Registers ............................................................................................................................115 <strong>AAU</strong> Hardware Descriptor Format..............................................................................................126 <strong>AAU</strong> Hardware Descriptor ..........................................................................................................127 <strong>AAU</strong> Software Descriptor Structure ............................................................................................138 <strong>AAU</strong> Device Descriptor ...............................................................................................................149 User SGL Header .......................................................................................................................1410 <strong>AAU</strong> User SGL Structure............................................................................................................15White Paper 5


Revision HistoryRevision HistoryDate Revision DescriptionJanuary 2002 001 Initial Release.6 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>White Paper Purpose and Description1.0 White Paper Purpose and DescriptionIncreasing I/O demands are central to Network and Storage high performance applications. <strong>Intel</strong> ®XScale microarchitecture (ARM* architecture compliant) addresses this trend with the <strong>Intel</strong> ® <strong>80310</strong>I/O processor chipset (<strong>80310</strong>). Features of the <strong>Intel</strong> ® <strong>80310</strong> solution include the Application AcceleratorUnit (<strong>AAU</strong>).The purpose of this paper is to provide <strong>Intel</strong> customers a fast development ramp in using the ApplicationAccelerator Unit (<strong>AAU</strong>) on the <strong>80310</strong>. This is achieved by providing a implementation case study. Thecontents of this document are meant to be a supplement to the <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, Chapter 10, <strong>Intel</strong>-referenced Optimization Guides and the other extensive <strong>Intel</strong>documentation listed in the Section 1.2, “Related Documents”.1.1 Document Highlights• Section 2.0, “Application Accelerator Unit”: <strong>AAU</strong> Hardware Overview.• Section 1.2, “Related Documents”: A listing of related documents and web links.• Section 3.0, “Low-Level Design Document”: This is a case study presenting a Low-LevelDesign Document used in a Linux implemenation of <strong>AAU</strong> hardware.• Section A, “<strong>AAU</strong> Source Code”: The Linux implementation source code.• Section 4.0, “Code Commentary” and Section 5.0, “Potential Enhancements”: CodeCommentary discussing implementation with source code line references. Commentaryincludes identifying optimization implemented, interrupt handling and potential enhancementsto existing implementation.• Section B, “Example Calling Source Code”: Examples Calling Source Code APIs.• Section C, “MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture”: A listing for MMUimplementation called in source code.1.2 Related Documents• <strong>Intel</strong> ® 80312 I/O Companion Chip Developer’s Manual (273410).• <strong>Intel</strong> ® 80200 <strong>Processor</strong> based on <strong>Intel</strong> ® XScale Microarchitecture Developer’s Manual(273411).• <strong>Intel</strong> ® IQ<strong>80310</strong> Evaluation Platform Board Manual (273431).• <strong>Intel</strong> ® XScale Microarchitecture <strong>Coding</strong> <strong>Techniques</strong> White Paper (273578).Other Application Notes and tools:• http://www.intel.com/design/iio/docs/iop310.htm.• http://www.intel.com/design/iio/devtools/tptools.htm.• http://www.intel.com/design/intelxscale/.White Paper 7


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Application Accelerator Unit2.0 Application Accelerator Unit2.0.1 OverviewThe <strong>AAU</strong> provides low-latency, high-throughput data transfer capability between the <strong>AAU</strong> and the<strong>Intel</strong> ® 80200 processor based on XScale microarchitecture (ARM* architecture compliant) localmemory. It executes data transfers to and from <strong>Intel</strong> ® 80200 processor (80200) local memory andalso provides the necessary programming interface. The Application Accelerator performs thefollowing functions:• Transfers data (read) from memory controller.• Performs an optional boolean operation (XOR) on read data.• Transfers data (write) to memory controller.The <strong>AAU</strong> features:• 1 KB, arranged as 8-byte x 128-deep store queue.— Configurable to a 512-byte, arranged as 8-byte x 64-deep store queue.• Utilization of the <strong>Intel</strong> ® 80312 I/O companion chip (80312) memory controller Interface.• 2 32 addressing range on the 80200 local memory interface.• Hardware support for unaligned data transfers for the internal bus.• Fully programmable from the 80200.• Support for automatic data chaining for gathering and scattering of data blocks.Figure 1 shows a simplified connection of the Application Accelerator to the 80312 internal bus.Figure 1.Application Accelerator UnitApplication Accelerator Unitinternal bus8 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.0 Low-Level Design Document3.1 ObjectiveThis section presents the low level design details of the <strong>AAU</strong> API for <strong>Intel</strong> ® XScale microarchitecture embedded Linux.Table 1.AcronymsTermsDefinitions<strong>AAU</strong>APIOSPCISGLApplication Accelerator UnitApplication Programming InterfaceOperating SystemPeripheral Component InterconnectScattered Gather List3.1.1 <strong>AAU</strong> Implementation3.1.1.1 OverviewThe 80312 contains an <strong>AAU</strong> to enable the hardware functionality of the XOR algorithm. It iscapable of performing XOR operation on multiple blocks of source data and store the result back in80200 local memory. The embedded Linux for <strong>Intel</strong> ® XScale microarchitecture does notcurrently support the <strong>AAU</strong> functionality of the 80312. As a result, it is unable to take advantage ofthe <strong>AAU</strong>s XOR capabilities when it performs certain checksum calculations when using RAID 5storage solution. This results in a drastic performance hit due to the XOR operations done insoftware. The implementation outlined describes the details of the changes that need to be made toembedded Linux for <strong>Intel</strong> ® XScale microarchitecture in order to utilize the <strong>AAU</strong> and takeadvantage of the hardware acceleration.The <strong>AAU</strong> API is intending to abstract the hardware away from driver developers and provide necessaryfunctions for the developer to utilize the <strong>AAU</strong>. The <strong>AAU</strong> unit contains the following registers:Table 2. <strong>AAU</strong> Control Registers (Sheet 1 of 2)Register Register Name DescriptionACRASRADARANDARSAR[4]Accelerator Control RegisterAccelerator Status RegisterDescriptor Address RegisterNext Descriptor Address Register<strong>Intel</strong> ® 80312 I/O companion chipLocal Source Address RegistersAccelerator Control Word specifies parameters that dictatethe overall operating environment such as enabling theaccelerator and others.Accelerator Control Status shows the status of theaccelerator that includes transfer task done and errors.Address of Current Chain Descriptor is the address of thedescriptor currently being processed.Address of Next Chain Descriptor points to the nextdescriptor that is linked to the current descriptor. A NULLvalue indicates it is the end of the descriptor chain.<strong>Intel</strong> ® 80200 processor Address of Source points to thelocal address of the source data.White Paper 9


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.3 InitializationInitialization is done during kernel initialization. The <strong>AAU</strong> is initialized after the interruptcontroller has been initialized during kernel setup. The <strong>AAU</strong> registers are all in default reset valuesbefore initialization. Following is the <strong>AAU</strong> initialization sequence:• Disable accelerator by clearing the ACR register.• Setup and initialize all resource queues and stack.• Setup and initialize all spinlocks.• Allocate a number of <strong>AAU</strong> hardware descriptors.• Align hardware descriptors to eight 32-bit word boundaries.• Allocate a corresponding number of <strong>AAU</strong> software descriptors.• Link each hardware descriptor to software descriptor.• Put software descriptors on the free resource stack.• Assign appropriate interrupt numbers.• Assign proper registers.3.1.4 <strong>AAU</strong> Data StructuresTable 4 data structure directly maps to the <strong>AAU</strong> registers in order for easy access of the <strong>AAU</strong> registers.Table 4.<strong>AAU</strong> Registerstypedef struct _aau_regs_t{volatile u32 ACR; /* Accelerator Control Register */volatile u32 ASR; /* Accelerator Status Register */volatile u32 ADAR; /* Descriptor Address Register */volatile u32 ANDAR; /* Next Desc Address Register */volatile u32 LSAR; /* Local Source Address */volatile u32 LDAR; /* Local Destination Address */volatile u32 ABCR; /* Byte Count */volatile u32 ADCR; /* Descriptor Control */} aau_regs_t;White Paper 11


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentTo start an <strong>AAU</strong> operation an <strong>AAU</strong> hardware descriptor chain is built in the local memory. Thehardware descriptor is required to be aligned on an 8-word boundary and is comprised of sixcontiguous words. The hardware descriptor format is illustrated in Table 5. One or more hardwaredescriptors form an <strong>AAU</strong> descriptor chain.Table 5.<strong>AAU</strong> Hardware Descriptor FormatNext Descriptor Address (NDA)Source Address (SAR[0])Source Address (SAR[1])Source Address (SAR[2])Source Address (SAR[3])Destination Address (DAR)Byte Count (BC)Descriptor Control (DC)Source Address (SARE[0]) [optional]Source Address (SARE[1]) [optional]Source Address (SARE[2]) [optional]Source Address (SARE[3]) [optional]The NDA points to the next descriptor thus forming a chain. The chain is terminated by having a nullvalued NDA. The descriptor provides pointers to four source addresses. These source addressesprovides the source data for the XOR computation data source. The result of the XOR computationfrom the source addresses are written to the local memory location pointed to by the DAR. The BCregister contains the number of bytes there are in a block of data per source address. All blocks of datathat are pointed to by the source addresses have the same amount of data. Therefore, for example,when SAR[0] has 1024 bytes of data then the rest of the valid source addresses shall contain1024 bytes of data block each. A bit in the DC field enables the extension of additional four sourceaddress fields for processing when more than four data sources are required for the XORcomputation. The optional fields shall not be used until all existing four source fields are utilized. TheDC field also contains various mode bits to allow operations done on a per descriptor basis.The hardware descriptor for the <strong>AAU</strong> is presented in Table 6. This format is required by the <strong>AAU</strong>hardware. The source addresses 5 through 8 are optional. Any source address field not used mustcontain the NULL value. When any source address contains the NULL value then all the followingsource addresses must also contain the NULL value. All the source addresses and the destinationaddress must be 80200 local address. Also they must contain physical addresses instead of virtual.Table 6.<strong>AAU</strong> Hardware Descriptortypedef struct _aau_desc_t{u32 NDA; /* Next Descriptor Address */u32 SAR[4]; /* Source Addresses 0-3 */u32 DAR; /* Destination Address */u32 BC; /* Byte Count */u32 DC; /* Descriptor Control */u32 SARE[4]; /* Extended Source Addresses 0-3 */} aau_desc_t;12 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentTable 7.A software descriptor is created to encapsulate each <strong>AAU</strong> hardware descriptor. The softwaredescriptor contains additional information and status about the hardware descriptor that is notdescribed by the hardware descriptor. The software descriptor also enables the use of stack andqueue data structures to keep track of and manipulate the hardware descriptors without making anyformat changes to the hardware descriptor. A pool of software descriptors are allocated duringinitialization and put on a stack. An equal amount of hardware descriptors are created andencapsulated by the software descriptors. The resource pool removes the performance penaltysuffered by dynamically allocating descriptors during operation.The Table 7 data structure describes the <strong>AAU</strong> software descriptor.<strong>AAU</strong> Software Descriptor Structuretypedef struct _sw_aau_t{aau_desc_t aau_desc; /* <strong>AAU</strong> HW desc */u32 status ; /* <strong>AAU</strong> Status */struct _aau_sgl *next; /* pointer to next sgl */void *dest ; /* Destination */void *src[4] ; /* Source */void *ext_src[4]; /* Extended Source */u32 total_src; /* total src addresses */struct list_head link; /* link to queue */u32 aau_phys; /* <strong>AAU</strong> Physical Addr */u32 desc_addr; /* HW unaligned addr */u32 sgl_head; /* User SGL head Addr */struct _sw_aau_t *head; /* Head of list */struct _sw_aau_t *tail; /* Tail of list */} sw_aau_t;The <strong>AAU</strong> shall also have a global device descriptor that allows access to the accelerator registers,processing queues, queue locks, and accelerator status.White Paper 13


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentThe Table 8 data structure describes the <strong>AAU</strong> device. It keeps track of all the variables that arerelated to the <strong>AAU</strong>.Table 8.<strong>AAU</strong> Device Descriptortypedef struct _iop310_aau_t{const char *dev_id; /* Device ID */list_t process_q; /* Processing Q */list_t holding_q; /* Holding Q */spinlock_t lock_pq; /* PQ spinlock */spinlock_t lock_hq; /* HQ spinlock */aau_regs_t *regs; /* <strong>AAU</strong> registers */int irq; /* IRQ number */sw_aau_t *last_aau; /* ptr to last <strong>AAU</strong> disc */struct tq_struct aau_task; /* <strong>AAU</strong> task entry */wait_queue_head_t wait_q; /* <strong>AAU</strong> wait queue */atomic_t ref_count; /* <strong>AAU</strong> Reference count */} iop310_aau_t;The following structures represent the data format applications use to pass data to the <strong>AAU</strong> API.The application creates a SGL header with a SGL pointed to by the header. When no callbackfunction is required, the call_back value must set to NULL. The status field should bezeroed out before being passed down. The end of the list is always marked by the next_sglvariable in the SGL list pointed to NULL.Table 9.User SGL Headerstruct _aau_sgl_head_t{u32 total; /* total SGLs */aau_sgl_t *list; /* Pointer to list head */u32 status; /* SG status */aau_callback_t callback; /* Callback func ptr */} aau_sgl_head_t;14 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentThe <strong>AAU</strong> descriptor data structure is filled out by the user. It maps over the first portion of the softwaredescriptor sw_aau_t data structure. Casting eliminates a copy of values from one data structure toanother. When the user passes a correct user SGL list, all the API has to do is re-cast the list intosoftware descriptors and feed it to the processing queue. This requires a slight bit more knowledge of the<strong>AAU</strong> fields on the users part, but improves performance of the <strong>AAU</strong> operation considerably.Table 10.<strong>AAU</strong> User SGL Structurestruct_aau_sgl_t{aau_desc_taau_desc;u32status;struct_aau_sgl_t *next_sgl; /* Pointer to next SG */void*dest;void *src[4] ; /* Source group 1 */void *src_ext[4]; /* Source group 2 */u32 total_src; /* Total number of sources passed down */} aau_sgl_t;3.1.5 Data PathThe following is required for an application to utilize the <strong>AAU</strong> hardware through the <strong>AAU</strong> API. Theapplication must first attempt to request the usage of the <strong>AAU</strong> by calling the aau_request()function. This function requests and registers an interrupt for the <strong>AAU</strong>. When successful, theapplication is allowed to use the <strong>AAU</strong>. The API also keeps track of the usage of the <strong>AAU</strong> by using areference count method. When unsuccessful the error –EBUSY is returned to the caller.The driver applications are required to create a scattered gather list (SGL) defined in the format ofaau_sgl_t format with all information for <strong>AAU</strong> operation completed. The driver application isresponsible for allocating and keeping track of the memory to store the <strong>AAU</strong> input data and result. Theapplication calls the aau_queue_buffer() function to pass down the user SGL. The <strong>AAU</strong> APIgenerates an <strong>AAU</strong> descriptor chain from the passed down SGL using the <strong>AAU</strong> software descriptorsfrom the free <strong>AAU</strong> resource stack. When no free software descriptors are available the API goes to sleepfor a short period of time, and then tries again ten times before giving up and returning –ENOMEMerror. The Interrupt Enable bit is set by the function in the DC field of the last hardware descriptor in thechain to indicate end of chain. The <strong>AAU</strong> chain is queued into the processing queue by the functionwhich then calls aau_start() for the application. The aau_start() function checks to determineif the <strong>AAU</strong> is active. If not active then this is a new operation and which requires setting the appropriatebits, links accordingly and starting the <strong>AAU</strong>. If active then it is an ongoing operation which requiresappending to the existing chain and setting the chain resume bit. At this point theaau_queue_buffer()returns the control to the application while the <strong>AAU</strong> is doing its work.The application has two choices in handling the result of <strong>AAU</strong> completion:1. Sleep on the <strong>AAU</strong>s wait queue until being notified by the bottom half interrupt handler later onwhen operation is complete2. Continue and be notified by a callback function when the operation on the chain is completevia the SGL passed down.The <strong>AAU</strong> meanwhile processes the chain and triggers an interrupt when it encounters the InterruptEnable Bit being set in a descriptor being processed or an error condition is encountered.White Paper 15


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentWhen an <strong>AAU</strong> interrupt is asserted, the interrupt handler function aau_irq_handler() iscalled. Clearing an interrupt requires clearing it at the source. In this case the source is theAccelerator Status Register. When the Accelerator Status Resister has been cleared in the interrupthandler, the interrupt assert bit shall be cleared by the hardware as a consequence. As long as the<strong>AAU</strong> interrupt is asserted due to new <strong>AAU</strong> interrupts, the interrupt handler continues to remove thedescriptors from the channel process queue and put the descriptors in the channel holding queueuntil the ADAR value equals to the address of the descriptor or the queue is empty. When theADAR equals the descriptor address and the ASR indicates that the channel is active then thatdescriptor is not removed. Once the interrupt handler no longer sees an <strong>AAU</strong> interrupt beingasserted it schedules a bottom half handler in the immediate task queue to process the holdingqueue and notify the application of the progress of the <strong>AAU</strong> operation.The application calls the function aau_free() when it no longer needs the <strong>AAU</strong> and wants torelease it. Depending on the reference count, the IRQ requested for the <strong>AAU</strong> may be freed. Whenthere are any errors for the <strong>AAU</strong> unit, the <strong>AAU</strong> registers are cleared, all resources are returned, andthe reference count shall be reset to 0.Figure 2 shows the state trace diagram for a normal operation of the <strong>AAU</strong>. The diagramdemonstrates all the necessary function calls that are performed during a normal, simple <strong>AAU</strong>execution path. The section explaining the APIs in detail follows.Figure 2.<strong>AAU</strong> State Trace DiagramUser <strong>AAU</strong> INTC Systemaau_get_buffer()aau_init()aau_queue_buffer()aau_start()<strong>AAU</strong> Completeaau_irq_handler()Sleep on wait queue or Proceed oncallbackaau_process()aau_buffer_return()Callback or Wake if Sleepingaau_free()aau_task()16 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6 API FunctionsThe following functions shall be implemented to support the <strong>AAU</strong> API for <strong>Intel</strong> ® XScale microarchitecture embedded Linux.3.1.6.1 API Listing3.1.6.1.1 <strong>AAU</strong> Publicintintintintintaau_sgl_t*voidintaau_request(u32 *aau_context);aau_suspend(u32 aau_context);aau_resume(u32 aau_context);aau_queue_buffer(u32_context, aau_sgl_t *sgl);aau_free(u32 aau_context);aau_get_buffer(u32 aau_context, u32 num_buff);aau_return_buffer(u32 aau_context, sgl_list_t *list);aau_memcpy(void *, void *, u32);3.1.6.1.2 <strong>AAU</strong> Private (Static)static int __initstatic intstatic intstatic voidstatic voidstatic voidaau_init(void);\aau_start(iop310_aau_t *aau_chain);aau_flush_all(u32 aau_context);aau_irq_handler(int irq, void *dev_id, struct pt_regs *regs);aau_process(iop310_aau_t *aau);aau_result_handler(void *aau);White Paper 17


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2 Selected API Descriptions3.1.6.2.1 static int __init aau_init(void);Input:Output:N/ASuccess -- OKError -- -ENOMEMPurpose: This function initializes the <strong>AAU</strong> during kernel init. The function initializes all thevariables to ready state and allocates memory for the resource pools. The <strong>AAU</strong> is at postreset state at this point. After initialization the <strong>AAU</strong> should be in the idle state.Operation:• Initialize free resource stack• Initialize stack lock• Allocate memory for software descriptors— Returnerroriffail• Align memory on 8-byte boundary— Returnerroriffail• Push software descriptors onto free resource stack• Set register addresses for <strong>AAU</strong>• Initialize <strong>AAU</strong> queues and locks• Initialize wait queue• Assign interrupt number• Initialize all <strong>AAU</strong> reference count• Initialize interrupt bottom handler for immediate process queue• Zero out ACR18 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2.2 static int aau_start(iop310_aau_t *aau, sw_aau_t *aau_chain);Input:Output:aau – pointer to <strong>AAU</strong> device descriptoraau_chain – pointer to <strong>AAU</strong> descriptor chain to be sent to <strong>AAU</strong>Success/Error conditionPurpose: This function starts the <strong>AAU</strong> or appends an <strong>AAU</strong> chain and resumes the operation whena chain is being processed.Operation:• If <strong>AAU</strong> not active— Write <strong>AAU</strong> descriptor address to ANDAR— Set enable accelerator bit in ACR• Else— Link chain to last <strong>AAU</strong> list tail ANDAR— Flush cache for range of tail descriptor ANDAR— If channel no longer active• Set chain resume bit in ACR• Set last descriptor pointer in <strong>AAU</strong> device descriptor3.1.6.2.3 int aau_request(u32 *aau_context);Input:Output:aau_context – pass by reference <strong>AAU</strong> context. Written back by function.success -- OKfailed -- -EINVALPurpose: This function requests an interrupt for the <strong>AAU</strong> from the kernel and returns the <strong>AAU</strong>descriptor to the driver application.Operation:• Register IRQ with kernel• Increment reference count of <strong>AAU</strong>• Return <strong>AAU</strong> device descriptor to user3.1.6.2.4 int aau_suspend(u32 aau_context);Input:Output:aau_context – <strong>AAU</strong> device contextSuccess/Error conditionPurpose: This function suspends the <strong>AAU</strong> operation. It calls aau_stop() to perform theoperation.Operation:• Unset bit in ACR that enables <strong>AAU</strong> operationWhite Paper 19


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2.5 int aau_resume(u32 aau_context);Input:Output:aau – <strong>AAU</strong> device contextSuccess/Error conditionPurpose: This function resumes the <strong>AAU</strong> operation.Operation:• If ASR contains errors— Clear errors— Flush <strong>AAU</strong> pipeline— Return with error• Set enable bit in ACR3.1.6.2.6 int aau_queue_buffer(u32 aau_context, aau_sgl_t *sgl);Input:aau_context – <strong>AAU</strong> device contextsgl – User SGL for <strong>AAU</strong> to transform to <strong>AAU</strong> descriptor chainOutput:Success/Error conditionPurpose: This function converts the user SGL to an <strong>AAU</strong> descriptor chain. The function then putsthe chain in the processing queue and starts the <strong>AAU</strong>.Operation:• For all elements in SGL— Get <strong>AAU</strong> software descriptor from free resource stack— Convert to <strong>AAU</strong> descriptor— Init appropriate variables in <strong>AAU</strong> software descriptor— Flush cache in appropriate regions— Link up <strong>AAU</strong> chain• Call aau_start() and pass <strong>AAU</strong> chain20 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2.9 static void aau_irq_handler(int irq, void *dev_id, structpt_regs *regs);Input:Output:irq –IRQnumberdev_id – Device Descriptorregs – CPU registers (not used but required)N/APurpose: This is the interrupt handler for <strong>AAU</strong> interrupts. It handles any error interrupts or chaincomplete interrupts depending on the status in the ASR. A bottom handler queued in theimmediate task queue by this function begins to process everything in the holding queuewhen this function exits and the kernel leaves the interrupt space.Operation:• If not <strong>AAU</strong> interrupt—Exit• If <strong>AAU</strong> error—Callaau_flush_all()• While <strong>AAU</strong> complete INTs—ClearASR—Callaau_process()• Register bottom handler22 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design DocumentFigure 3.Interrupt Handler Functional Flow DiagramEnterInterruptHandlerCheck if INTfor <strong>AAU</strong>NoYesFlush <strong>AAU</strong> / ClearErrorsYesCheck anyerrors in ASRNoMove desc fromprocess to holdingQNoDone?YesSchedule INTbottom handlerExitWhite Paper 23


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2.10 static void aau_process(iop310_aau_t *aau);Input:Output:aau – <strong>AAU</strong> device descriptorN/APurpose: This function removes the done descriptor from the processing queue and put them in theholding queue to be processed by the bottom handler later. This function is only calledby the interrupt handler.Operation:• Do while descriptor address != ADAR and queue not empty— Remove from processing queue— Put on holding queue— If IE bit set in ADCR set <strong>AAU</strong>_DONE on chain head descriptor3.1.6.2.11 static void aau_result_handler(void *aau);Input:Output:*aau – <strong>AAU</strong> device descriptorN/APurpose: This function is scheduled by the interrupt handler to finish processing <strong>AAU</strong> descriptorsafter the INT handler is done and exits the interrupt space. It notifies the driverperforming the <strong>AAU</strong> either by waking the driver up when sleeping or use a callbackfunction provided by the driver.Operation:• Do while descriptor status == <strong>AAU</strong>_DONE— Remove descriptor from holding queue— Set status on user SGL— Return descriptor to free stack— If callback function exists• Call callback— Else if sleeping on wait queue• Wake up sleeping process24 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Low-Level Design Document3.1.6.2.12 aau_sgl_t * aau_get_buffer(u32 aau_context, u32 num_buf);Input:Output:aau_context – <strong>AAU</strong> device contextnum_buf – number of buffers to acquire.aau_sgl_t * - chain of <strong>AAU</strong> acquired, NULL if failed.Purpose: This function is used to acquire a chain of user SGL buffers. After obtaining the list theuser need to fill it out, link it to a SGL head and pass it to aau_queue_buffer()function.Operation:• While free stack not empty— Acquire buffer— If failed• Retry• If Retry fails— Return all acquired buffer— Return NULL— Fill out necessary field— Link buffer to list• Return list3.1.6.2.13 void aau_return_buffer(u32 aau_context, sgl_list_t *list);Input:Output:aau_context – <strong>AAU</strong> device context.*list – SGL list to be returned.N/APurpose: This function takes the SGL list passed in by the user and return it to the free stack.Operation:• While not end of list— Put SGL element on free stack.White Paper 25


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Code Commentary4.0 Code Commentary4.1 Section ObjectivesPrimary Objective: To identify and describe aspects of the implementation that relate to <strong>80310</strong>hardware and standard operating system issues.Secondary Objective: Provide additional background on the linux APIs to facilitate reading thecode and understanding the implementation.This code was written to be integrated in the Linux Kernel. Therefore Linux data structures andAPIs defined and optimized by the Linux community are used.Recommended Approach to understanding code is to begin with aau_init() and follow function callsequence in Figure 2, “<strong>AAU</strong> State Trace Diagram” on page 16. Also see sections provided foradditional implementation support:• Appendix B, “Example Calling Source Code”• Appendix C, “MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture”4.1.1 File Organization OverviewThere are three files included in Appendix A:• \include\aau.h• \src\aau.h• \src\aau.cFile \include\aau.h includes the public definitions and function APIs. Note that the publicdata structure definition of struct aau_sgl_t is cast to private definition struct sw_aau_t.Files \src\aau.h and \src\aau.c include private definitions, APIs and function calls. NoteAPIs that are static are private and local to the file, and those that are not static are publiccalls. The static modifier localize the functions to the c file and the symbol is not exported.4.1.1.1 Key Data Structure and Use of CastingThe primary data structure used by application to initiate an <strong>AAU</strong> transaction is stuct aau_sgl_t(see code line 72). When the application is filling our the source and destination address in thedescriptor, physical addresses not virtual addresses are required. The aau_sgl_t is cast to datastructure sw_aau_t for processing (line 444). Note the descriptors are chained together withinfunction aau_queue_buffer, line 463.26 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Code Commentary4.1.2 Cache MemoryThe following function is required for <strong>AAU</strong> descriptors since cache memory and RAM coherenceare required to be managed by the programmer. Remember the <strong>AAU</strong> engine reads the <strong>AAU</strong>descriptors from RAM. Therefore values in cache are required to be flushed by the programmer toRAM (see Appendix C in this document for implementation).• cpu_xscale_dcache_clean_range(start, end)For the specified virtual address range, ensure that all caches containclean data, such that peripheral accesses to the physical RAM fetchcorrect data.start: virtual start addressend: virtual end address4.1.3 Other <strong>AAU</strong> HardwareThe <strong>AAU</strong> hardware is described in the <strong>Intel</strong> ® 80312 I/O Companion Chip Developer’s Manualpages 10-1 through 10-33. For register definitions see pages 10-23 through 10-31.In the Appendix A code, see Descriptor Control Register (DC) bit definitions line 40 through line54. For Accelerator Control Register (ACR) and Accelerator Status Register (ASR) see bitdefinitions at lines 124 through 136.The addresses for referencing the memory mapped registers are references using #defines. Seeexamples in code lines 301, 305 and 320.• IOP310_<strong>AAU</strong>ANDAR - Address of Accelerator next Descriptor Address Register• IOP310_<strong>AAU</strong>ACR - Address of Accelerator Control Register• IOP310_<strong>AAU</strong>ASR - Address of Accelerator Status Register4.1.4 Virtual to Physical memoryCache flush/invalidate and memory mapped registers operate with virtual memory addresses<strong>AAU</strong> descriptor operations operate from physical memory and require physical addresses. Forexample see Appendix A.3, line895.White Paper 27


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Code Commentary4.1.5 Interrupt HandlingLinux interrupt handling is split between top and bottom halves. The top half interrupt handler iscalled when the hardware interrupt is invoked and performs only minimal critical tasks includingscheduling the bottom half handlers. Bottom half handlers are schedules by marking the handler forfuture execution.Three status registers are involved in interrupt handling. Clearing a interrupt requires clearing theinterrupt at the source which in this case is the Accelerator Status Register. The action of clearingthe interrupt requires writing a 1 to the bit to be cleared (See <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, page 1-7, section 1.4.2)). The three registers are:• FIQ1 Interrupt Status Register (IOP310_FIQ1ISR). Appendix A, Lines 637 & 666.This register is used to determine the cause of the interrupt. If Bit 5 is set there is a ApplicationAccelerator Interrupt Pending (See <strong>Intel</strong> ® 80312 I/O Companion Chip Developer’s Manual,page 2-12)• Accelerator Status Register (IOP310_<strong>AAU</strong>ASR) Appendix A, Lines 648 & 669.This register contains the <strong>AAU</strong> status flags. The interrupt is cleared by writing 1s to the set bits.• IRQ Interrupt Status Register (Not used in this code)Bit 10 indicates a Application Accelerator Unit Error (See <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, page2-15)4.1.5.1 Top Half Interrupt Handler: aau_irq_handler()See Appendix A, line 629.The following statuses are obtained:• FIQ1 Interrupt Status Register (IOP310_FIQ1ISR). Appendix A, lines 637 & 666.• Accelerator Status Register (IOP310_<strong>AAU</strong>ASR) Appendix A, lines 648 & 669.The <strong>AAU</strong> interrupt is cleared. Appendix A, Line657.When the End of Transfer or End of Chain Interrupt is set, the function aau_process() is called. Thepurpose of aau_process() is to move all the <strong>AAU</strong> descriptors in the processing queue that areconsidered done to the holding queue.The bottom half handler is marked scheduled. Appendix A, line 672.4.1.5.2 Bottom Half Interrupt Handler: aau_task()See Appendix A, Line 758. This function processes all the completed <strong>AAU</strong> chain descriptors in theholding Q, wakes up the user and frees the resource.4.1.6 Linux Kernel APIsThis code contains numerous calls to Linux kernel macros or APIs. Primarily these are Linux callsused for declaring and handling queues and stack data structures and controlling variable access.When developing custom applications users of this document will call their Operating Systemsequivalent APIs and macros.28 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Code Commentary4.2 Optimization Related4.2.1 Stack verses QueueThere are built and controlled with Linux kernel data structures and APIs. Using a stack for freedescriptors increases the likelihood descriptors requested are still in the cache. A queue for thechain is required since chaining demands FIFO (first in first out) sequence. As previously stated,Linux kernel data structures have been optimized by the Linux community.4.2.2 Chaining and ResumeChaining allows the application to build a list of transfers which may not require the use of the <strong>Intel</strong>80200 processor until all transfers are complete (See <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, page 10-9). In addition, while the <strong>AAU</strong> is executing a existing chain, aincremental descriptor or chain of descriptors can be appended concurrently by using the ChainResume feature (See <strong>Intel</strong> ® 80312 I/O Companion Chip Developer’s Manual, page 10-16). Theexpanded chain then executes as a single uninterrupted set of transactions.See Appendix A, lines 297 through 326 for implementation.4.2.3 Requiring the Application to Supply Physical Addresses in <strong>AAU</strong>Descriptor (verses virtual addresses)This requirement minimizes time between hand off from application and <strong>AAU</strong> processing software.4.2.4 Allocations of Memory for <strong>AAU</strong> Decriptors During InitializationPreallocating memory for <strong>AAU</strong> descriptors eliminates costly runtime memory allocations.4.2.5 Using <strong>AAU</strong> for Local Memory to Local Memory Copy: mem_copy()The advantage of using the <strong>AAU</strong> for local memory to local memory copying:• In absolute terms it is faster for non-trivial copies.• It happens in parallel to other core processing.When calling aau_memcopy() use the exact same syntax as memcopy(). See <strong>Intel</strong> ® 80312 I/OCompanion Chip Developer’s Manual, page 10-31 through 10-33 for full description.Appendix A, lines 1108 and 1109• Covert virtual to physical address and write physical address to <strong>AAU</strong> descriptorAppendix A, line 1112• <strong>AAU</strong>_DCR_WRITE— Sets bit 31. Description of operation specified: Write Enable• <strong>AAU</strong>_DCR_BLKCTRL_1_DF— Sets all bits 03:01 for Block 1 Command Control. Description of operation specified: DirectFillAppendix A, line 1115• Sets Interrupt Enable for this descriptorWhite Paper 29


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Potential Enhancements5.0 Potential Enhancements5.1 Error HandlingWhen the Application Accelerator Unit generates a error during the execution of a <strong>AAU</strong>Descriptor, a interrupt is triggered and IRQ Interrupt Status Register, Bit 10 is set. Bit 10 being setindicates a Application Accelerator Unit Error (See <strong>Intel</strong> ® 80312 I/O Companion Chip Developer’sManual, page 2-5). After identifying the source of the interrupt as the <strong>AAU</strong>, the application shouldshould test the Accelerator Status Register (ASR) Bits. (See <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, page 10-25)• Bit 10 is clear: The Accelerator Active Flag being clear indicates the channel is idle. This bitmy be cleared as a result of a bus error.• Bit 5 is set: The Master-abort bit is set when a master abort occurs during a transaction whenthe <strong>AAU</strong> is the Master on the internal bus.Before clearing the interrupt, the application can use the Accelerator Descriptor Address Register(ADAR) to identify the currently executing decriptor. The descriptor can be marked as having failedprior to the interrupt being cleared and processing continuing. One approach is to write the contentsof the ASR to a status variable attached to the descriptor (See <strong>Intel</strong> ® 80312 I/O Companion ChipDeveloper’s Manual, sections 10.8 and 10.9 for Interrupt States and Error Conditions).5.2 Lookaside Cache Scheme (This is Linux specific)When implementing in Linux, device driver developers should consider using theLookaside Cache scheme instead of allocating memory using kmalloc when creatinghardware descriptors. The Lookaside Cache provides memory address alignment andother features that allows the efficient use of the Linux memory management for devicedriver development.5.3 Extensive <strong>Intel</strong> Optimization Related Documentation<strong>Intel</strong> provides extensive optimization related documentation. As part of the applicationdevelopment process it is recommended to review the <strong>Intel</strong> ® XScale Microarchitecture <strong>Coding</strong><strong>Techniques</strong> White Paper and the <strong>Intel</strong> ® 80200 <strong>Processor</strong> based on <strong>Intel</strong> ® XScale Microarchitecture Developer’s Manual, Appendix B for optimization opportunities.30 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Conclusion6.0 ConclusionAs discussed, increasing I/O demands are central to Network and Storage high performanceapplications. <strong>Intel</strong> ® XScale microarchitecture addresses this trend with the <strong>80310</strong>. Features of the<strong>Intel</strong> ® <strong>80310</strong> solution that include <strong>AAU</strong>.This paper and the accompanying source code have presented a <strong>AAU</strong> implementation including theLow Level Design, coded implementation and code commentary to provide software developers atemplate in order to speed the ramp for developing <strong>AAU</strong> applications. For their unique applications,Developers can design and build their own custom solutions using this template along with the <strong>Intel</strong>Optimization literature.White Paper 31


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source CodeAppendix A <strong>AAU</strong> Source CodeA.1 Public Definitions for <strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong><strong>AAU</strong>: \include\aau.h1 /*2 * Definitions for IOP310 <strong>AAU</strong>3 *4 * Author: Dave Jiang (dave.jiang@intel.com)5 * Copyright (C) 2001 <strong>Intel</strong> Corporation6 *7 * This program is free software; you can redistribute it and/or modify8 * it under the terms of the GNU General Public License version 2 as9 * published by the Free Software Foundation.10 *11 */1213 #ifndef _IOP310_<strong>AAU</strong>_H_14 #define _IOP310_<strong>AAU</strong>_H_151617 #define DEFAULT_<strong>AAU</strong>_IRQ_THRESH 101819 #define MAX_<strong>AAU</strong>_DESC 1024/* 64 */20 #define <strong>AAU</strong>_SAR_GROUP 4212223 #define <strong>AAU</strong>_DESC_DONE 0x001024 #define <strong>AAU</strong>_INCOMPLETE 0x002025 #define <strong>AAU</strong>_HOLD 0x004026 #define <strong>AAU</strong>_END_CHAIN 0x008027 #define <strong>AAU</strong>_COMPLETE 0x010028 #define <strong>AAU</strong>_NOTIFY 0x020029 #define <strong>AAU</strong>_NEW_HEAD 0x040032 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code3031 #define <strong>AAU</strong>_USER_MASK(<strong>AAU</strong>_NOTIFY | <strong>AAU</strong>_INCOMPLETE | \32 <strong>AAU</strong>_HOLD | <strong>AAU</strong>_COMPLETE)3334 #define DESC_HEAD 0x001035 #define DESC_TAIL 0x00203637 /* result writeback */38 #define <strong>AAU</strong>_DCR_WRITE 0x8000000039 /* source block extension */40 #define <strong>AAU</strong>_DCR_BLK_EXT 0x0200000041 #define <strong>AAU</strong>_DCR_BLKCTRL_8_XOR 0x0040000042 #define <strong>AAU</strong>_DCR_BLKCTRL_7_XOR 0x0008000043 #define <strong>AAU</strong>_DCR_BLKCTRL_6_XOR 0x0001000044 #define <strong>AAU</strong>_DCR_BLKCTRL_5_XOR 0x0000200045 #define <strong>AAU</strong>_DCR_BLKCTRL_4_XOR 0x0000040046 #define <strong>AAU</strong>_DCR_BLKCTRL_3_XOR 0x0000008047 #define <strong>AAU</strong>_DCR_BLKCTRL_2_XOR 0x0000001048 #define <strong>AAU</strong>_DCR_BLKCTRL_1_XOR 0x0000000249 /* first block direct fill instead of XOR to buffer */50 #define <strong>AAU</strong>_DCR_BLKCTRL_1_DF 0x0000000E51 /* interrupt enable */52 #define <strong>AAU</strong>_DCR_IE 0x000000015354 #define DCR_BLKCTRL_OFFSET 3555657 /* <strong>AAU</strong> callback */58 typedef void (*aau_callback_t) (void *buf_id);5960 /* hardware descriptor */61 typedef struct _aau_desc62 {63 u32 NDA; /* next descriptor address */White Paper 33


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code64 u32 SAR[<strong>AAU</strong>_SAR_GROUP];/* src addrs */65 u32 DAR; /* destination addr */66 u32 BC; /* byte count */67 u32 DC; /* descriptor control */68 u32 SARE[<strong>AAU</strong>_SAR_GROUP];/* extended src addrs */69 } aau_desc_t;7071 /* user SGL format */72 typedef struct _aau_sgl73 {74 aau_desc_t aau_desc;/* <strong>AAU</strong> HW Desc */75 u32 status;76 struct _aau_sgl *next;/* pointer to next SG */77 void *dest; /* destination addr */78 void *src[<strong>AAU</strong>_SAR_GROUP];/* source addr[4] */79 void *ext_src[<strong>AAU</strong>_SAR_GROUP];/* ext src addr[4] */80 u32 total_src; /* total number of source */81 } aau_sgl_t;8283 /* header for user SGL */84 typedef struct _aau_head85 {86 u32 total;87 u32 status; /* SGL status */88 aau_sgl_t *list; /* ptr to head of list */89 aau_callback_t callback;/* callback func ptr */90 } aau_head_t;9192 /* prototypes */93 int aau_request(u32 *, const char *);94 int aau_queue_buffer(u32, aau_head_t *);95 void aau_suspend(u32);96 void aau_resume(u32);97 void aau_free(u32);34 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code98 void aau_set_irq_threshold(u32, int);99 void aau_return_buffer(u32, aau_sgl_t *);100 aau_sgl_t *aau_get_buffer(u32, int);101 int aau_memcpy(void *, void *, u32);102103 #endif104 /* EOF */White Paper 35


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source CodeA.2 Private Definitions for <strong>Intel</strong> ® XScale Microarchitecture<strong>AAU</strong>: \src\aau.h105 /*106 * Private Definitions for <strong>Intel</strong> ® XScale microarchitecture <strong>AAU</strong>107 *108 * Author: Dave Jiang (dave.jiang@intel.com)109 * Copyright (C) 2001 <strong>Intel</strong> Corporation110 *111 * This program is free software; you can redistribute it and/or modify112 * it under the terms of the GNU General Public License version 2 as113 * published by the Free Software Foundation.114 *115 */116117 #ifndef _<strong>AAU</strong>_PRIVATE_H_118 #define _<strong>AAU</strong>_PRIVATE_H_119120 #define SLEEP_TIME 50121 #define <strong>AAU</strong>_DESC_SIZE 48122 #define <strong>AAU</strong>_INT_MASK 0x0020123124 #define <strong>AAU</strong>_ACR_CLEAR 0x00000000125 #define <strong>AAU</strong>_ACR_ENABLE 0x00000001126 #define <strong>AAU</strong>_ACR_CHAIN_RESUME 0x00000002127 #define <strong>AAU</strong>_ACR_512_BUFFER 0x00000004128129 #define <strong>AAU</strong>_ASR_CLEAR 0x00000320130 #define <strong>AAU</strong>_ASR_MA_ABORT 0x00000020131 #define <strong>AAU</strong>_ASR_ERROR_MASK <strong>AAU</strong>_ASR_MA_ABORT132 #define <strong>AAU</strong>_ASR_DONE_EOT 0x00000200133 #define <strong>AAU</strong>_ASR_DONE_EOC 0x00000100134 #define <strong>AAU</strong>_ASR_DONE_MASK (<strong>AAU</strong>_ASR_DONE_EOT | <strong>AAU</strong>_ASR_DONE_EOC)135 #define <strong>AAU</strong>_ASR_ACTIVE 0x0000040036 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code136 #define <strong>AAU</strong>_ASR_MASK (<strong>AAU</strong>_ASR_ERROR_MASK | <strong>AAU</strong>_ASR_DONE_MASK)137138 /* software descriptor */139 typedef struct _sw_aau140 {141 aau_desc_t aau_desc;/* <strong>AAU</strong> HW Desc */142 u32 status;143 struct _aau_sgl *next; /* pointer to next SG */144 void *dest; /* destination addr */145 void *src[<strong>AAU</strong>_SAR_GROUP];/* source addr[4] */146 void *ext_src[<strong>AAU</strong>_SAR_GROUP];/* ext src addr[4] */147 u32 total_src; /* total number of source */148 struct list_head link; /* Link to queue */149 u32 aau_phys; /* <strong>AAU</strong> Phys Addr (aligned) */150 u32 desc_addr; /* unaligned HWDESC virtual addr */151 u32 sgl_head;152 struct _sw_aau *head; /* head of list */153 struct _sw_aau *tail; /* tail of list */154 } sw_aau_t;155156 /* <strong>AAU</strong> registers */157 typedef struct _aau_regs_t158 {159 volatile u32 ACR; /* Accelerator Control Register */160 volatile u32 ASR; /* Accelerator Status Register */161 volatile u32 ADAR; /* Descriptor Address Register */162 volatile u32 ANDAR; /* Next Desc Address Register */163 volatile u32 LSAR[<strong>AAU</strong>_SAR_GROUP];/* source addrs */164 volatile u32 LDAR; /* local destination address register */165 volatile u32 ABCR; /* byte count */166 volatile u32 ADCR; /* Descriptor Control */167 volatile u32 LSARE[<strong>AAU</strong>_SAR_GROUP];/* extended src addrs */168 } aau_regs_t;169White Paper 37


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code170171 /* device descriptor */172 typedef struct _iop310_aau_t173 {174 const char *dev_id; /* Device ID */175 struct list_head process_q;/* Process Q */176 struct list_head hold_q;/* Holding Q */177 spinlock_t process_lock;/* PQ spinlock */178 spinlock_t hold_lock;/* HQ spinlock */179 aau_regs_t *regs; /* <strong>AAU</strong> registers */180 int irq; /* IRQ number */181 sw_aau_t *last_aau; /* ptr to last <strong>AAU</strong> desc */182 struct tq_struct aau_task;/* <strong>AAU</strong> task entry */183 wait_queue_head_t wait_q;/* <strong>AAU</strong> wait queue */184 atomic_t ref_count; /* <strong>AAU</strong> ref count */185 atomic_t irq_thresh;/* IRQ threshold */186 } iop310_aau_t;187188 #define SW_ENTRY(list) list_entry((list), sw_aau_t, link)189190 #endif38 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source CodeA.3 Support Functions for the <strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong><strong>Chipset</strong><strong>AAU</strong>:\src\aau.c191 /**************************************************************************192 * arch/arm/mach-iop310/aau.c193 *194 * Support functions for the <strong>Intel</strong> <strong>80310</strong> <strong>AAU</strong>.195 * (see also Documentation/arm/XScale/IOP310/aau.txt)196 *197 * Author: Dave Jiang (dave.jiang@intel.com)198 * Copyright (C) 2001 <strong>Intel</strong> Corporation199 *200 * This program is free software; you can redistribute it and/or modify201 * it under the terms of the GNU General Public License version 2 as202 * published by the Free Software Foundation.203 *204 * Todos: Thorough Error handling205 * Do zero-size <strong>AAU</strong> transfer/channel at init206 * so all we have to do is chaining207 *208 *209 * History: (07/18/2001, DJ) Initial Creation210 * (08/22/2001, DJ) Changed spinlock calls to no save flags211 * (08/27/2001, DJ) Added irq threshold handling212 * (09/11/2001, DJ) Changed <strong>AAU</strong> to list data structure,213 * modified the user interface with embedded descriptors.214 *215 *************************************************************************/216217 #include 218 #include 219 #include 220 #include 221 #include White Paper 39


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code256257 /* static prototypes */258 static int __init aau_init(void);259 static int aau_start(iop310_aau_t *, sw_aau_t *);260 static int aau_flush_all(u32);261 static void aau_process(iop310_aau_t *);262 static void aau_task(void *);263 static void aau_irq_handler(int, void *, struct pt_regs *);264265 /*=======================================================================*/266 /* Procedure: aau_start() */267 /* */268 /* Description: This function starts the <strong>AAU</strong>. If the <strong>AAU</strong> */269 /* has already started then chain resume is done */270 /* */271 /* Parameters: aau: <strong>AAU</strong> device */272 /* aau_chain: <strong>AAU</strong> data chain to pass to the <strong>AAU</strong> */273 /* */274 /* Returns: int -- success: OK */275 /* failure: -EBUSY */276 /* */277 /* Notes/Assumptions: */278 /* */279 /* History: Dave Jiang 07/18/01 Initial Creation */280 /*=======================================================================*/281 static int aau_start(iop310_aau_t * aau, sw_aau_t * aau_chain)282 {283 u32 status;284285 /* get accelerator status */286 status = *(IOP310_<strong>AAU</strong>ASR);287288 /* check accelerator status error */289 if(status & <strong>AAU</strong>_ASR_ERROR_MASK)White Paper 41


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code290 {291 DPRINTK("start: Accelerator Error %x\n", status);292 /* should clean the accelerator up then, or let int handle it? */293 return -EBUSY;294 }295296 /* if first time */297 if(!(status & <strong>AAU</strong>_ASR_ACTIVE))298 {299 /* set the next descriptor address register */300301 *(IOP310_<strong>AAU</strong>ANDAR) = aau_chain->aau_phys;302303 DPRINTK("Enabling accelerator now\n");304 /* enable the accelerator */305 *(IOP310_<strong>AAU</strong>ACR) |= <strong>AAU</strong>_ACR_ENABLE;306 }307 else308 {309 DPRINTK("Resuming chain\n");310 /* if active, chain up to last <strong>AAU</strong> chain */311312 aau->last_aau->aau_desc.NDA = aau_chain->aau_phys;313314 /* flush cache since we changed the field */315 /* 32bit word long */316 cpu_dcache_clean_range((u32)&aau->last_aau->aau_desc.NDA,317 (u32)(&aau->last_aau->aau_desc.NDA));318319 /* resume the chain */320 *(IOP310_<strong>AAU</strong>ACR) |= <strong>AAU</strong>_ACR_CHAIN_RESUME;321 }322323 /* set the last accelerator descriptor to last descriptor in chain */42 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code324 aau->last_aau = aau_chain->tail;325326 return 0;327 }328329330 /*=======================================================================*/331 /* Procedure: aau_request() */332 /* */333 /* Description: This function requests the <strong>AAU</strong> */334 /* */335 /* Parameters: aau_context: aau context */336 /* device_id -- unique device name */337 /* */338 /* Returns: 0 - ok */339 /* NULL -- failed */340 /* */341 /* Notes/Assumptions: */342 /* */343 /* History: Dave Jiang 07/18/01 Initial Creation */344 /*=======================================================================*/345 int aau_request(u32 * aau_context, const char *device_id)346 {347 iop310_aau_t *aau = &aau_dev;348349 DPRINTK("Entering <strong>AAU</strong> request\n");350 /* increment reference count */351 atomic_inc(&aau->ref_count);352353 /* get interrupt if ref count is less than or equal to 1 */354 if(atomic_read(&aau->ref_count) dev_id = device_id;White Paper 43


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code358 }359360 DPRINTK("Assigning <strong>AAU</strong>\n");361 *aau_context = (u32) aau;362363 return 0;364 }365366 /*=======================================================================*/367 /* Procedure: aau_suspend() */368 /* */369 /* Description: This function suspends the <strong>AAU</strong> at the earliest */370 /* instant it is capable of. */371 /* */372 /* Parameters: aau: <strong>AAU</strong> device context */373 /* */374 /* Returns: N/A */375 /* */376 /* Notes/Assumptions: */377 /* */378 /* History: Dave Jiang 07/18/01 Initial Creation */379 /*=======================================================================*/380 void aau_suspend(u32 aau_context)381 {382 iop310_aau_t *aau = (iop310_aau_t *) aau_context;383 *(IOP310_<strong>AAU</strong>ACR) &= ~<strong>AAU</strong>_ACR_ENABLE;384 }385386 /*=======================================================================*/387 /* Procedure: aau_resume() */388 /* */389 /* Description: This function resumes the <strong>AAU</strong> operations */390 /* */391 /* Parameters: aau: <strong>AAU</strong> device context */44 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code392 /* */393 /* Returns: N/A */394 /* */395 /* Notes/Assumptions: */396 /* */397 /* History: Dave Jiang 07/18/01 Initial Creation */398 /*=======================================================================*/399 void aau_resume(u32 aau_context)400 {401 iop310_aau_t *aau = (iop310_aau_t *) aau_context;402 u32 status;403404 status = *(IOP310_<strong>AAU</strong>ASR);405406 /* if it's already active */407 if(status & <strong>AAU</strong>_ASR_ACTIVE)408 {409 DPRINTK("Accelerator already active\n");410 return;411 }412 else if(status & <strong>AAU</strong>_ASR_ERROR_MASK)413 {414 printk("<strong>80310</strong> <strong>AAU</strong> in error state! Cannot resume\n");415 return;416 }417 else418 {419 *(IOP310_<strong>AAU</strong>ACR) |= <strong>AAU</strong>_ACR_ENABLE;420 }421 }422423 /*=======================================================================*/424 /* Procedure: aau_queue_buffer() */425 /* */White Paper 45


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code426 /* Description: This function creates an <strong>AAU</strong> buffer chain from the */427 /* user supplied SGL chain. It also puts the <strong>AAU</strong> chain */428 /* onto the processing queue. This then starts the <strong>AAU</strong> */429 /* */430 /* Parameters: aau: <strong>AAU</strong> device context */431 /* listhead: User SGL */432 /* */433 /* Returns: int: success -- OK */434 /* failed: -ENOMEM */435 /* */436 /* Notes/Assumptions: User SGL must point to kernel memory, not user */437 /* */438 /* History: Dave Jiang 07/18/01 Initial Creation */439 /* Dave Jiang 07/20/01 Removed some junk code not suppose */440 /* to be there that causes infinite loop */441 /*=======================================================================*/442 int aau_queue_buffer(u32 aau_context, aau_head_t * listhead)443 {444 sw_aau_t *sw_desc = (sw_aau_t *) listhead->list;445 sw_aau_t *prev_desc = NULL;446 sw_aau_t *head = NULL;447 aau_head_t *sgl_head = listhead;448 int err = 0;449 int i;450 iop310_aau_t *aau = (iop310_aau_t *) aau_context;451 DECLARE_WAIT_QUEUE_HEAD(wait_q);452453 DPRINTK("Entering aau_queue_buffer()\n");454455 /* scan through entire user SGL */456 while(sw_desc)457 {458 sw_desc->sgl_head = (u32) listhead;45946 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code460 /* we clean the cache for previous descriptor in chain */461 if(prev_desc)462 {463 prev_desc->aau_desc.NDA = sw_desc->aau_phys;464 cpu_dcache_clean_range((u32)&prev_desc->aau_desc,465 (u32)&prev_desc->aau_desc + <strong>AAU</strong>_DESC_SIZE);466 }467 else468 {469 /* no previous descriptor, so we set this to be head */470 head = sw_desc;471 }472473 sw_desc->head = head;474 /* set previous to current */475 prev_desc = sw_desc;476477 /* put descriptor on process */478 spin_lock_irq(&aau->process_lock);479 list_add_tail(&sw_desc->link, &aau->process_q);480 spin_unlock_irq(&aau->process_lock);481482 sw_desc = (sw_aau_t *)sw_desc->next;483 }484 DPRINTK("Done converting SGL to <strong>AAU</strong> Chain List\n");485486 /* if our tail exists */487 if(prev_desc)488 {489 /* set the head pointer on tail */490 prev_desc->head = head;491 /* set the header pointer's tail to tail */492 head->tail = prev_desc;493 prev_desc->tail = prev_desc;White Paper 47


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code494495 /* clean cache for tail */496 cpu_dcache_clean_range((u32)&prev_desc->aau_desc,497 (u32)&prev_desc->aau_desc + <strong>AAU</strong>_DESC_SIZE);498499 DPRINTK("Starting <strong>AAU</strong> accelerator\n");500 /* start the <strong>AAU</strong> */501 DPRINTK("Starting at chain: 0x%x\n", (u32)head);502 if((err = aau_start(aau, head)) >= 0)503 {504 DPRINTK("ASR: %#x\n", *IOP310_<strong>AAU</strong>ASR);505 if(!sgl_head->callback)506 {507 wait_event_interruptible(aau->wait_q,508 (sgl_head->status & <strong>AAU</strong>_COMPLETE));509 }510 return 0;511 }512 else513 {514 DPRINTK("<strong>AAU</strong> start failed!\n");515 return err;516 }517 }518519 return -EINVAL;520 }521522 /*=======================================================================*/523 /* Procedure: aau_flush_all() */524 /* */525 /* Description: This function flushes the entire process queue for */526 /* the <strong>AAU</strong>. It also clears the <strong>AAU</strong>. */527 /* */48 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code528 /* Parameters: aau: <strong>AAU</strong> device context */529 /* */530 /* Returns: int: success -- OK */531 /* */532 /* Notes/Assumptions: */533 /* */534 /* History: Dave Jiang 07/19/01 Initial Creation */535 /*=======================================================================*/536 static int aau_flush_all(u32 aau_context)537 {538 iop310_aau_t *aau = (iop310_aau_t *) aau_context;539 int flags;540 sw_aau_t *sw_desc;541542 DPRINTK("Flushall is being called\n");543544 /* clear ACR */545 /* read clear ASR */546 *(IOP310_<strong>AAU</strong>ACR) = <strong>AAU</strong>_ACR_CLEAR;547 *(IOP310_<strong>AAU</strong>ASR) |= <strong>AAU</strong>_ASR_CLEAR;548549 /* clean up processing Q */550 while(!list_empty(&aau->hold_q))551 {552 spin_lock_irqsave(&aau->process_lock, flags);553 sw_desc = SW_ENTRY(aau->process_q.next);554 list_del(aau->process_q.next);555 spin_unlock_irqrestore(&aau->process_lock, flags);556557 /* set status to be incomplete */558 sw_desc->status |= <strong>AAU</strong>_INCOMPLETE;559 /* put descriptor on holding queue */560 spin_lock_irqsave(&aau->hold_lock, flags);561 list_add_tail(&sw_desc->link, &aau->hold_q);White Paper 49


562 spin_unlock_irqrestore(&aau->hold_lock, flags);563 }564565 return 0;566 }567568 /*=======================================================================*/569 /* Procedure: aau_free() */570 /* */571 /* Description: This function frees the <strong>AAU</strong> from usage. */572 /* */573 /* Parameters: aau -- <strong>AAU</strong> device context */574 /* */575 /* Returns: int: success -- OK */576 /* */577 /* Notes/Assumptions: */578 /* */579 /* History: Dave Jiang 07/19/01 Initial Creation */580 /*=======================================================================*/581 void aau_free(u32 aau_context)582 {583 iop310_aau_t *aau = (iop310_aau_t *) aau_context;584585 atomic_dec(&aau->ref_count);586587 /* if ref count is 1 or less, you are the last owner */588 if(atomic_read(&aau->ref_count) last_aau)


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code596 {597 aau->last_aau = NULL;598 }599600 DPRINTK("Freeing IRQ %d\n", aau->irq);601 /* free the IRQ */602 free_irq(aau->irq, (void *)aau);603 }604605 DPRINTK("freed\n");606 }607608 /*=======================================================================*/609 /* Procedure: aau_irq_handler() */610 /* */611 /* Description: This function is the int handler for the <strong>AAU</strong> */612 /* driver. It removes the done <strong>AAU</strong> descriptors from the */613 /* process queue and put them on the holding Q. it */614 /* continues to process until process queue empty or */615 /* the current <strong>AAU</strong> desc on the accelerator is the one */616 /* we are inspecting */617 /* */618 /* Parameters: irq: IRQ activated */619 /* dev_id: device */620 /* regs: registers */621 /* */622 /* Returns: NONE */623 /* */624 /* Notes/Assumptions: Interrupt is masked */625 /* */626 /* History: Dave Jiang 07/19/01 Initial Creation */627 /* Dave Jiang 07/20/01 Check FIQ1 instead of ASR for INTs */628 /*=======================================================================*/629 static void aau_irq_handler(int irq, void *dev_id, struct pt_regs *regs)White Paper 51


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code630 {631 iop310_aau_t *aau = (iop310_aau_t *) dev_id;632 u32 int_status = 0;633 u32 status = 0;634 u32 thresh;635636 /* get FIQ1 status */637 int_status = *(IOP310_FIQ1ISR);638639 DPRINTK("IRQ: irq=%d status=%#x\n", irq, status);640641 /* this is not our interrupt */642 if(!(int_status & <strong>AAU</strong>_INT_MASK))643 {644 return;645 }646647 /* get accelerator status */648 status = *(IOP310_<strong>AAU</strong>ASR);649650 /* get threshold */651 thresh = atomic_read(&aau->irq_thresh);652653 /* process while we have INT */654 while((int_status & <strong>AAU</strong>_INT_MASK) && thresh--)655 {656 /* clear ASR */657 *(IOP310_<strong>AAU</strong>ASR) &= <strong>AAU</strong>_ASR_MASK;658659 /* */660 if(status & <strong>AAU</strong>_ASR_DONE_MASK)661 {662 aau_process(aau);663 }52 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code664665 /* read accelerator status */666 status = *(IOP310_<strong>AAU</strong>ASR);667668 /* get interrupt status */669 int_status = *(IOP310_FIQ1ISR);670 }671672 /* schedule bottom half */673 aau->aau_task.data = (void *)aau;674 /* task goes to the immediate task queue */675 queue_task(&aau->aau_task, &tq_immediate);676 /* mark IMMEDIATE BH for execute */677 mark_bh(IMMEDIATE_BH);678 }679680681 /*=======================================================================*/682 /* Procedure: aau_process() */683 /* */684 /* Description: This function processes moves all the <strong>AAU</strong> desc in */685 /* the processing queue that are considered done to the */686 /* holding queue. It is called by the int when the */687 /* done INTs are asserted. It continues until */688 /* either the process Q is empty or current <strong>AAU</strong> desc */689 /* equals to the one in the ADAR */690 /* */691 /* Parameters: aau: <strong>AAU</strong> device as parameter */692 /* */693 /* Returns: NONE */694 /* */695 /* Notes/Assumptions: Interrupt is masked */696 /* */697 /* History: Dave Jiang 07/19/01 Initial Creation */White Paper 53


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code698 /*=======================================================================*/699 static void aau_process(iop310_aau_t * aau)700 {701 sw_aau_t *sw_desc;702 u8 same_addr = 0;703704 DPRINTK("Entering aau_process()\n");705706 while(!same_addr && !list_empty(&aau->process_q))707 {708 spin_lock(&aau->process_lock);709 sw_desc = SW_ENTRY(aau->process_q.next);710 list_del(aau->process_q.next);711 spin_unlock(&aau->process_lock);712713 if(sw_desc->head->tail->status & <strong>AAU</strong>_NEW_HEAD)714 {715 DPRINTK("Found new head\n");716 sw_desc->tail->head = sw_desc;717 sw_desc->head = sw_desc;718 sw_desc->tail->status &= ~<strong>AAU</strong>_NEW_HEAD;719 }720721 sw_desc->status |= <strong>AAU</strong>_DESC_DONE;722723 /* if we see end of chain, we set head status to DONE */724 if(sw_desc->aau_desc.DC & <strong>AAU</strong>_DCR_IE)725 {726 if(sw_desc->status & <strong>AAU</strong>_END_CHAIN)727 {728 sw_desc->tail->status |= <strong>AAU</strong>_COMPLETE;729 }730 else731 {54 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code732 sw_desc->head->tail = sw_desc;733 sw_desc->tail = sw_desc;734 sw_desc->tail->status |= <strong>AAU</strong>_NEW_HEAD;735 }736 sw_desc->tail->status |= <strong>AAU</strong>_NOTIFY;737 }738739 /* if descriptor equal same being processed, put it back */740 if(((u32) sw_desc == *(IOP310_<strong>AAU</strong>ADAR)741 ) && ( *(IOP310_<strong>AAU</strong>ASR) & <strong>AAU</strong>_ASR_ACTIVE))742 {743 spin_lock(&aau->process_lock);744 list_add(&sw_desc->link, &aau->process_q);745 spin_unlock(&aau->process_lock);746 same_addr = 1;747 }748 else749 {750 spin_lock(&aau->hold_lock);751 list_add_tail(&sw_desc->link, &aau->hold_q);752 spin_unlock(&aau->hold_lock);753 }754 }755 DPRINTK("Exit aau_process()\n");756 }757758 /*=======================================================================*/759 /* Procedure: aau_task() */760 /* */761 /* Description: This func is the bottom half handler of the <strong>AAU</strong> INT */762 /* handler. It is queued as an imm task on the imm */763 /* task Q. It process all the complete <strong>AAU</strong> chain in the */764 /* holding Q and wakes up the user and frees the */765 /* resource. */White Paper 55


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code766 /* */767 /* Parameters: aau_dev: <strong>AAU</strong> device as parameter */768 /* */769 /* Returns: NONE */770 /* */771 /* Notes/Assumptions: */772 /* */773 /* History: Dave Jiang 07/19/01 Initial Creation */774 /*=======================================================================*/775 static void aau_task(void *aau_dev)776 {777 iop310_aau_t *aau = (iop310_aau_t *) aau_dev;778 u8 end_chain = 0;779 sw_aau_t *sw_desc = NULL;780 aau_head_t *listhead = NULL;/* user list */781782 DPRINTK("Entering bottom half\n");783784 if(!list_empty(&aau->hold_q))785 {786 sw_desc = SW_ENTRY(aau->hold_q.next);787 listhead = (aau_head_t *) sw_desc->sgl_head;788 }789 else790 return;791792 /* process while <strong>AAU</strong> chain is complete */793 while(sw_desc && (sw_desc->tail->status & (<strong>AAU</strong>_NOTIFY | <strong>AAU</strong>_INCOMPLETE)))794 {795 /* clean up until end of <strong>AAU</strong> chain */796 while(!end_chain)797 {798 /* IE flag indicate end of chain */799 if(sw_desc->aau_desc.DC & <strong>AAU</strong>_DCR_IE)56 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code800 {801 end_chain = 1;802 listhead->status |=803 sw_desc->tail->status & <strong>AAU</strong>_USER_MASK;804805 sw_desc->status |= <strong>AAU</strong>_NOTIFY;806807 if(sw_desc->status & <strong>AAU</strong>_END_CHAIN)808 listhead->status |= <strong>AAU</strong>_COMPLETE;809 }810811 spin_lock_irq(&aau->hold_lock);812 /* remove from holding queue */813 list_del(&sw_desc->link);814 spin_unlock_irq(&aau->hold_lock);815816 cpu_dcache_invalidate_range((u32)&sw_desc->aau_desc,817 (u32)&sw_desc->aau_desc + <strong>AAU</strong>_DESC_SIZE);818819 if(!list_empty(&aau->hold_q))820 {821 sw_desc = SW_ENTRY(aau->hold_q.next);822 listhead = (aau_head_t *) sw_desc->sgl_head;823 }824 else825 sw_desc = NULL;826 }827828 /* reset end of chain flag */829 end_chain = 0;830831 /* wake up user function waiting for return */832 /* or use callback if exist */833 if(listhead->callback)White Paper 57


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code834 {835 DPRINTK("Calling callback\n");836 listhead->callback((void *)listhead);837 }838 else if(listhead->status & <strong>AAU</strong>_COMPLETE)839 /* if(waitqueue_active(&aau->wait_q)) */840 {841 DPRINTK("Waking up waiting process\n");842 wake_up_interruptible(&aau->wait_q);843 }844 } /* end while */845 DPRINTK("Exiting bottom task\n");846 }847848 /*=======================================================================*/849 /* Procedure: aau_init() */850 /* */851 /* Description: This function initializes the <strong>AAU</strong>. */852 /* */853 /* Parameters: NONE */854 /* */855 /* Returns: int: success -- OK */856 /* */857 /* Notes/Assumptions: */858 /* */859 /* History: Dave Jiang 07/18/01 Initial Creation */860 /*=======================================================================*/861 static int __init aau_init(void)862 {863 int i;864 sw_aau_t *sw_desc;865 int err;866 void *desc = NULL;86758 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code868 printk("<strong>Intel</strong> <strong>80310</strong> <strong>AAU</strong> Copyright(c) 2001 <strong>Intel</strong> Corporation\n");869 DPRINTK("Initializing...");870871 /* set the IRQ */872 aau_dev.irq = IRQ_IOP310_<strong>AAU</strong>;873874 err = request_irq(aau_dev.irq, aau_irq_handler, SA_INTERRUPT,875 NULL, (void *)&aau_dev);876 if(err < 0)877 {878 printk(KERN_ERR "unable to request IRQ %d for <strong>AAU</strong>: %d\n",879 aau_dev.irq, err);880 return err;881 }882883 /* init free stack */884 INIT_LIST_HEAD(&free_stack);885 /* init free stack spinlock */886 spin_lock_init(&free_lock);887888889 /* pre-alloc <strong>AAU</strong> descriptors */890 for(i = 0; i < MAX_<strong>AAU</strong>_DESC; i++)891 {892 desc = kmalloc((sizeof(sw_aau_t) + 0x20), GFP_KERNEL);893 memset(desc, 0, sizeof(sw_aau_t));894 sw_desc = (sw_aau_t *) (((u32) desc & 0xffffffe0) + 0x20);895 sw_desc->aau_phys = virt_to_phys((void *)sw_desc);896 /* we keep track of original address before alignment adjust */897 /* so we can free it later */898 sw_desc->desc_addr = (u32) desc;899900 spin_lock_irq(&free_lock);901 /* put the descriptors on the free stack */White Paper 59


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code902 list_add_tail(&sw_desc->link, &free_stack);903 spin_unlock_irq(&free_lock);904 }905906 /* set the register data structure to the mapped memory regs <strong>AAU</strong> */907 aau_dev.regs = (aau_regs_t *) IOP310_<strong>AAU</strong>ACR;908909 atomic_set(&aau_dev.ref_count, 0);910911 /* init process Q */912 INIT_LIST_HEAD(&aau_dev.process_q);913 /* init holding Q */914 INIT_LIST_HEAD(&aau_dev.hold_q);915 /* init locks for Qs */916 spin_lock_init(&aau_dev.hold_lock);917 spin_lock_init(&aau_dev.process_lock);918919 aau_dev.last_aau = NULL;920921 /* initialize BH task */922 aau_dev.aau_task.sync = 0;923 aau_dev.aau_task.routine = (void *)aau_task;924925 /* initialize wait Q */926 init_waitqueue_head(&aau_dev.wait_q);927928 /* clear <strong>AAU</strong> channel control register */929 *(IOP310_<strong>AAU</strong>ACR) = <strong>AAU</strong>_ACR_CLEAR;930 *(IOP310_<strong>AAU</strong>ASR) = <strong>AAU</strong>_ASR_CLEAR;931 *(IOP310_<strong>AAU</strong>ANDAR) = 0;932933 /* set default irq threshold */934 atomic_set(&aau_dev.irq_thresh, DEFAULT_<strong>AAU</strong>_IRQ_THRESH);935 DPRINTK("Done!\n");60 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code936937 return 0;938 }939940 /*=======================================================================*/941 /* Procedure: aau_set_irq_threshold() */942 /* */943 /* Description: This function readjust the threshold for the irq. */944 /* */945 /* Parameters: aau: pointer to aau device descriptor */946 /* value: value of new irq threshold */947 /* */948 /* Returns: N/A */949 /* */950 /* Notes/Assumptions: default is set at 10 */951 /* */952 /* History: Dave Jiang 08/27/01 Initial Creation */953 /*=======================================================================*/954 void aau_set_irq_threshold(u32 aau_context, int value)955 {956 iop310_aau_t *aau = (iop310_aau_t *) aau_context;957 atomic_set(&aau->irq_thresh, value);958 } /* End of aau_set_irq_threshold() */959960961 /*=======================================================================*/962 /* Procedure: aau_get_buffer() */963 /* */964 /* Description: This function acquires an SGL element for the user */965 /* and returns that. It retries multiple times if no */966 /* descriptor is available. */967 /* */968 /* Parameters: aau_context: <strong>AAU</strong> context */969 /* num_buf: number of descriptors */White Paper 61


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code970 /* */971 /* Returns: N/A */972 /* */973 /* Notes/Assumptions: */974 /* */975 /* History: Dave Jiang 9/11/01 Initial Creation */976 /* Dave Jiang 10/04/01Fixed list linking problem */977 /*=======================================================================*/978 aau_sgl_t *aau_get_buffer(u32 aau_context, int num_buf)979 {980 sw_aau_t *sw_desc = NULL;981 sw_aau_t *sw_head = NULL;982 sw_aau_t *sw_prev = NULL;983984 int retry = 10;985 int i;986 DECLARE_WAIT_QUEUE_HEAD(wait_q);987988 if((num_buf > MAX_<strong>AAU</strong>_DESC) || (num_buf 0; i--)995 {996 spin_lock_irq(&free_lock);997 if(!list_empty(&free_stack))998 {999 sw_desc = SW_ENTRY(free_stack.next);1000 list_del(free_stack.next);1001 spin_unlock_irq(&free_lock);1002 }1003 else62 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code1004 {1005 while(retry-- && !sw_desc)1006 {1007 spin_unlock_irq(&free_lock);1008 interruptible_sleep_on_timeout(&wait_q, SLEEP_TIME);1009 spin_lock_irq(&free_lock);1010 if(!list_empty(&free_stack))1011 {1012 sw_desc = SW_ENTRY(free_stack.next);1013 list_del(free_stack.next);1014 }1015 spin_unlock_irq(&free_lock);1016 }10171018 sw_desc = sw_head;1019 spin_lock_irq(&free_lock);1020 while(sw_desc)1021 {1022 sw_desc->status = 0;1023 sw_desc->head = NULL;1024 sw_desc->tail = NULL;1025 list_add(&sw_desc->link, &free_stack);1026 sw_desc = (sw_aau_t *) sw_desc->next;1027 } /* end while */1028 spin_unlock_irq(&dma_free_lock);1029 return NULL;1030 } /* end else */10311032 if(sw_prev)1033 {1034 sw_prev->next = (aau_sgl_t *) sw_desc;1035 sw_prev->aau_desc.NDA = sw_desc->aau_phys;1036 }1037 elseWhite Paper 63


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code1038 {1039 sw_head = sw_desc;1040 }10411042 sw_prev = sw_desc;1043 } /* end for */10441045 sw_desc->aau_desc.NDA = 0;1046 sw_desc->next = NULL;1047 sw_desc->status = 0;1048 return (aau_sgl_t *) sw_head;1049 }105010511052 /*=======================================================================*/1053 /* Procedure: aau_return_buffer() */1054 /* */1055 /* Description: This function takes a list of SGL and return it to */1056 /* the free stack. */1057 /* */1058 /* Parameters: aau_context: <strong>AAU</strong> context */1059 /* list: SGL list to return to free stack */1060 /* */1061 /* Returns: N/A */1062 /* */1063 /* Notes/Assumptions: */1064 /* */1065 /* History: Dave Jiang 9/11/01 Initial Creation */1066 /*=======================================================================*/1067 void aau_return_buffer(u32 aau_context, aau_sgl_t * list)1068 {1069 sw_aau_t *sw_desc = (sw_aau_t *) list;10701071 spin_lock_irq(&free_lock);64 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code1072 while(sw_desc)1073 if(sw_desc)1074 {1075 list_add(&sw_desc->link, &free_stack);1076 sw_desc = (sw_aau_t *) sw_desc->next;1077 }1078 spin_unlock_irq(&free_lock);1079 }10801081 int aau_memcpy(void *dest, void *src, u32 size)1082 {10831084 iop310_aau_t *aau = &aau_dev; /* Global variable */1085 aau_head_t head;1086 aau_sgl_t *list;1087 int err;10881089 head.total = size;1090 head.status = 0;1091 head.callback = NULL;10921093 list = aau_get_buffer((u32) aau, 1);1094 if(list)1095 {1096 head.list = list;1097 }1098 else1099 {1100 return -ENOMEM;1101 }1102110311041105 while(list)White Paper 65


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong><strong>AAU</strong> Source Code1106 {1107 list->status = 0;1108 list->src[0] = src;1109 list->aau_desc.SAR[0] = (u32) virt_to_phys(src);1110 list->dest = dest;1111 list->aau_desc.DAR = (u32) virt_to_phys(dest);1112 list->aau_desc.BC = size;1113 list->aau_desc.DC = <strong>AAU</strong>_DCR_WRITE | <strong>AAU</strong>_DCR_BLKCTRL_1_DF;1114 if(!list->next)1115 {1116 list->aau_desc.DC |= <strong>AAU</strong>_DCR_IE;1117 list->status |= <strong>AAU</strong>_END_CHAIN;1118 break;1119 }1120 list = list->next;1121 }1122 err = aau_queue_buffer((u32) aau, &head);1123 aau_return_buffer((u32) aau, head.list);1124 return err;1125 }11261127 EXPORT_SYMBOL_NOVERS(aau_request);1128 EXPORT_SYMBOL_NOVERS(aau_queue_buffer);1129 EXPORT_SYMBOL_NOVERS(aau_suspend);1130 EXPORT_SYMBOL_NOVERS(aau_resume);1131 EXPORT_SYMBOL_NOVERS(aau_free);1132 EXPORT_SYMBOL_NOVERS(aau_set_irq_threshold);1133 EXPORT_SYMBOL_NOVERS(aau_get_buffer);1134 EXPORT_SYMBOL_NOVERS(aau_return_buffer);1135 EXPORT_SYMBOL_NOVERS(aau_memcpy);11361137 module_init(aau_init);66 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source CodeAppendix B Example Calling Source CodeB.1 Standard CallsSupport functions for the <strong>80310</strong> <strong>AAU</strong>===========================================Dave Jiang Last updated: 09/18/2001The <strong>Intel</strong>® 80312 I/O companion chip in the <strong>80310</strong> chipset contains an <strong>AAU</strong>. The<strong>AAU</strong> is capable of processing up to 8 data block sources and perform XORoperations on them. This unit is typically used to accelerated XORoperations utilized by RAID storage device drivers such as RAID 5. ThisAPI is designed to provide a set of functions to take advantage of the<strong>AAU</strong>. The <strong>AAU</strong> can also be used to transfer data blocks and used as a memorycopier. The <strong>AAU</strong> transfer the memory faster than the operation performed byusing CPU copy therefore it is recommended to use the <strong>AAU</strong> for memory copy.------------------int aau_request(u32 *aau_context, const char *device_id);This function allows the user the acquire the control of the <strong>AAU</strong>. Thefunction will return a context of <strong>AAU</strong> to the user and allocatean interrupt for the <strong>AAU</strong>. The user must pass the context as a parameter tovarious <strong>AAU</strong> API calls.int aau_queue_buffer(u32 aau_context, aau_head_t *listhead);This function starts the <strong>AAU</strong> operation. The user must create a SGLheader with a SGL attached. The format is presented below. The SGL isbuilt from kernel memory./* hardware descriptor */typedef struct _aau_desc{White Paper 67


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source Codeu32 NDA; /* next descriptor address [READONLY] */u32 SAR[<strong>AAU</strong>_SAR_GROUP]; /* src addrs */u32 DAR; /* destination addr */u32 BC; /* byte count */u32 DC; /* descriptor control */u32 SARE[<strong>AAU</strong>_SAR_GROUP]; /* extended src addrs */} aau_desc_t;/* user SGL format */typedef struct _aau_sgl{aau_desc_t aau_desc; /* <strong>AAU</strong> HW Desc */u32 status; /* status of SGL [READONLY] */struct _aau_sgl*next; /* pointer to next SG [READONLY] */void *dest; /* destination addr */void *src[<strong>AAU</strong>_SAR_GROUP]; /* source addr[4] */void *ext_src[<strong>AAU</strong>_SAR_GROUP]; /* ext src addr[4] */u32 total_src; /* total number of source */} aau_sgl_t;/* header for user SGL */typedef struct _aau_head{u32 total; /* total descriptors allocated */u32 status; /* SGL status */aau_sgl_t *list; /* ptr to head of list */aau_callback_t callback; /* callback func ptr */} aau_head_t;The function will call aau_start() and start the <strong>AAU</strong> after it queuesthe SGL to the processing queue. When the function will eithera. Sleep on the wait queue aau->wait_q if no callback has been provided, orb. Continue and then call the provided callback function when DMA interrupt68 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source Codehas been triggered.int aau_suspend(u32 aau_context);Stops/Suspends the <strong>AAU</strong> operationint aau_free(u32 aau_context);Frees the ownership of <strong>AAU</strong>. Called when no longer need <strong>AAU</strong> service.aau_sgl_t * aau_get_buffer(u32 aau_context, int num_buf);This function obtains an <strong>AAU</strong> SGL for the user. User must specify the numberof descriptors to be allocated in the chain that is returned.void aau_return_buffer(u32 aau_context, aau_sgl_t *list);This function returns all SGL back to the API after user is done.int aau_memcpy(void *dest, void *src, u32 size);This function is a short cut for user to do memory copy utilizing the <strong>AAU</strong> forbetter large block memory copy vs. using the CPU. This is similar to usingtypical memcopy() call.* User is responsible for the source address(es) and the destination address.The source and destination should all be cached memory.void aau_test(){u32 aau;char dev_id[] = "<strong>AAU</strong>";int size = 2;int err = 0;aau_head_t *head;aau_sgl_t *list;u32 i;White Paper 69


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source Codeu32 result = 0;void *src, *dest;printk("Starting <strong>AAU</strong> test\n");if((err = aau_request(&aau, dev_id))total = size;head->status = 0;head->callback = NULL;list = aau_get_buffer(aau, size);if(!list){printk("Can't get buffers\n");return;}head->list = list;src = kmalloc(1024, GFP_KERNEL);dest = kmalloc(1024, GFP_KERNEL);while(list){list->status = 0;list->aau_desc->SAR[0] = (u32)src;70 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source Codelist->aau_desc->DAR = (u32)dest;list->aau_desc->BC = 1024;/* see iop310-aau.h for more DCR commands */list->aau_desc->DC = <strong>AAU</strong>_DCR_WRITE | <strong>AAU</strong>_DCR_BLKCTRL_1_DF;if(!list->next){list->aau_desc->DC = <strong>AAU</strong>_DCR_IE;break;}list = list->next;}printk("test- Queueing buffer for <strong>AAU</strong> operation\n");err = aau_queue_buffer(aau, head);if(err >= 0){printk("<strong>AAU</strong> Queue Buffer is done...\n");}else{printk("<strong>AAU</strong> Queue Buffer failed...: %d\n", err);}#if 1printk("freeing the <strong>AAU</strong>\n");aau_return_buffer(aau, head->list);aau_free(aau);kfree(src);kfree(dest);kfree((void *)head);#endifWhite Paper 71


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>Example Calling Source Code}All Disclaimers apply. Use this at your own discretion. Neither <strong>Intel</strong> nor Iwill be responsible if anything goes wrong. =)TODO____* Testing* Do zero-size <strong>AAU</strong> transfer/channel at initso all we have to do is chaining72 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale MicroarchitectureAppendix C MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture/** linux/arch/arm/mm/proc-xscale.S** Author:Nicolas Pitre* Created:November 2000* Copyright:(C) 2000, 2001 MontaVista Software Inc.** This program is free software; you can redistribute it and/or modify* it under the terms of the GNU General Public License version 2 as* published by the Free Software Foundation.** MMU functions for the <strong>Intel</strong> ® XScale microarchitecture** 2001 Aug 21:* some contributions by Brett Gaines * Copyright 2001 by <strong>Intel</strong> Corp.** 2001 Sep 08:* Completely revisited, many important fixes* Nicolas Pitre */#include #include #include #include #include /** This is the maximum size of an area which will be flushed. If the area* is larger than this, then we flush the whole cacheWhite Paper 73


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture*/#define MAX_AREA_SIZE32768/** the cache line size of the I and D cache*/#define CACHELINESIZE32/** the size of the data cache*/#define CACHESIZE32768/** and the page size*/#define PAGESIZE4096/** Virtual address used to allocate the cache when flushed** This must be an address range which is _never_ used. It should* apparently have a mapping in the corresponding page table for* compatibility with future CPUs that _could_ require it. For instance we* don't care.** This must be aligned on a 2*CACHESIZE boundary. The code selects one of* the 2 areas alternating each time the clean_d_cache macro is used.* Without this the <strong>Intel</strong> ® XScale core exhibits cache eviction problems and no one* knows why.** Reminder: the vector table is located at 0xffff0000-0xffff0fff.*/#define CLEAN_ADDR0xfffe000074 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture/** This macro is used to wait for a CP15 write and is needed* when we have to ensure that the last operation to the co-pro* was completed before continuing with operation.*/.macrocpwait, rdmrc p15, 0, \rd, c2, c0, 0@ arbitrary read of cp15mov \rd, \rd@ wait for completionsub pc, pc, #4 @ flush instruction pipeline.endm.macrocpwait_ret, lr, rdmrc p15, 0, \rd, c2, c0, 0@ arbitrary read of cp15sub pc, \lr, \rd, LSR #32@ wait for completion and@ flush instruction pipeline.endm/** This macro cleans the entire dcache using line allocate.* The main loop has been unrolled to reduce loop overhead.* rd and rs are two scratch registers.*/.macroclean_d_cache, rd, rsldr \rs, =clean_addrldr \rd, [\rs]eor \rd, \rd, #CACHESIZEstr \rd, [\rs]add \rs, \rd, #CACHESIZE1: mcr p15, 0, \rd, c7, c2, 5@ allocate D cache lineadd \rd, \rd, #CACHELINESIZEmcr p15, 0, \rd, c7, c2, 5@ allocate D cache lineadd \rd, \rd, #CACHELINESIZEmcr p15, 0, \rd, c7, c2, 5@ allocate D cache lineWhite Paper 75


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitectureadd \rd, \rd, #CACHELINESIZEmcr p15, 0, \rd, c7, c2, 5@ allocate D cache lineadd \rd, \rd, #CACHELINESIZEteq \rd, \rsbne 1b.endm.dataclean_addr:.wordCLEAN_ADDR.text/** cpu_xscale_data_abort()** obtain information about current aborted instruction** r0 = address of aborted instruction** Returns:* r0 = address of abort* r1 != 0 if writing* r3 = FSR*/.align5ENTRY(cpu_xscale_data_abort)mov r2, r0mrc p15, 0, r0, c6, c0, 0@ get FARmrc p15, 0, r3, c5, c0, 0@ get FSRldr r1, [r2]@ read aborted instructiontst r1, r1, lsr #21@ C = bit 20sbc r1, r1, r1 @ r1 = C - 1and r3, r3, #255mov pc, lr76 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture/** cpu_xscale_check_bugs()*/ENTRY(cpu_xscale_check_bugs)mrs ip, cpsrbic ip, ip, #F_BITmsr cpsr, ipmov pc, lr/** cpu_xscale_proc_init()** Nothing too exciting at the moment*/ENTRY(cpu_xscale_proc_init)mov pc, lr/** cpu_xscale_proc_fin()*/ENTRY(cpu_xscale_proc_fin)str lr, [sp, #-4]!mov r0, #F_BIT|I_BIT|SVC_MODEmsr cpsr_c, r0mrc p15, 0, r0, c1, c0, 0@ ctrl registerbic r0, r0, #0x1800@ ...IZ...........bic r0, r0, #0x0006@ .............CA.mcr p15, 0, r0, c1, c0, 0@ disable cachesblcpu_xscale_cache_clean_invalidate_all@ clean cachesldr pc, [sp], #4/** cpu_xscale_reset(loc)White Paper 77


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture** Perform a soft reset of the system. Put the CPU into the* same state as it would be if it had been reset, and branch* to what would be the reset vector.** loc: location to jump to for soft reset*/.align5ENTRY(cpu_xscale_reset)mov r1, #F_BIT|I_BIT|SVC_MODEmsr cpsr_c, r1@ reset CPSRmrc p15, 0, r1, c1, c0, 0@ ctrl registerbic r1, r1, #0x0086@ ........B....CA.bic r1, r1, #0x1900@ ...IZ..S........mcr p15, 0, r1, c1, c0, 0@ ctrl registermcr p15, 0, ip, c7, c7, 0@ invalidate I,D caches & BTBbic r1, r1, #0x0001@ ...............Mmcr p15, 0, r1, c1, c0, 0@ ctrl register@ CAUTION: MMU turned off from this point. We count on the pipeline@ already containing those two last instructions to survive.mcr p15, 0, ip, c8, c7, 0@ invalidate I & D TLBsmov pc, r0/** cpu_xscale_do_idle(type)** Cause the processor to idle** type:* 0 = slow idle* 1 = fast idle* 2 = switch to slow processor clock* 3 = switch to fast processor clock*78 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture* For now we do nothing but go to idle mode for every case** <strong>Intel</strong> ® XScale microarchitecture supports clock switching, but using idle mode* support allows external hardware to react to system state changes.*/.align5ENTRY(cpu_xscale_do_idle)mov r0, #1mcr p14, 0, r0, c7, c0, 0@ Go to IDLEmov pc, lr/* ================================= CACHE ================================ *//** cpu_xscale_cache_clean_invalidate_all (void)** clean and invalidate all cache lines** Note:* 1. We should preserve r0 at all times.* 2. Even if this function implies cache "invalidation" by its name,* we don't need to actually use explicit invalidation operations* since the goal is to discard all valid references from the cache* and the cleaning of it already has that effect.* 3. Because of 2 above and the fact that kernel space memory is always* coherent across task switches there is no need to worry about* inconsistencies due to interrupts, hence no irq disabling.*/.align5ENTRY(cpu_xscale_cache_clean_invalidate_all)mov r2, #1cpu_xscale_cache_clean_invalidate_all_r2:clean_d_cache r0, r1White Paper 79


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitectureteq r2, #0mcrnep15, 0, ip, c7, c5, 0@ Invalidate I cache & BTBmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/** cpu_xscale_cache_clean_invalidate_range(start, end, flags)** clean and invalidate all cache lines associated with this area of memory** start: Area start address* end: Area end address* flags: nonzero for I cache as well*/.align5ENTRY(cpu_xscale_cache_clean_invalidate_range)bic r0, r0, #CACHELINESIZE - 1@ round down to cache linesub r3, r1, r0cmp r3, #MAX_AREA_SIZEbhi cpu_xscale_cache_clean_invalidate_all_r21: mcr p15, 0, r0, c7, c10, 1@ Clean D cache linemcr p15, 0, r0, c7, c6, 1@ Invalidate D cache lineadd r0, r0, #CACHELINESIZEcmp r0, r1blo 1bteq r2, #0mcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermoveqpc, lrsub r0, r0, r31: mcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEcmp r0, r1blo 1bmcr p15, 0, ip, c7, c5, 6@ Invalidate BTB80 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturemov pc, lr/** cpu_xscale_flush_ram_page(page)** clean all cache lines associated with this memory page** page: page to clean*/.align5ENTRY(cpu_xscale_flush_ram_page)mov r1, #PAGESIZE1: mcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEsubsr1, r1, #2 * CACHELINESIZEbne 1bmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/* ================================ D-CACHE =============================== *//** cpu_xscale_dcache_invalidate_range(start, end)** throw away all D-cached data in specified region without an obligation* to write them back. Note however that on <strong>Intel</strong> ® XScale microarchitecture we* must clean all entries also due to hardware errata (80200 A0 & A1 only).** start: virtual start address* end: virtual end address*/.align5White Paper 81


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale MicroarchitectureENTRY(cpu_xscale_dcache_invalidate_range)mrc p15, 0, r2, c0, c0, 0@ Read part no.eor r2, r2, #0x69000000eor r2, r2, #0x00052000@ 80200 XX part no.bicsr2, r2, #0x1@ Clear LSB in revision fieldmoveqr2, #0beq cpu_xscale_cache_clean_invalidate_range@ An 80200 A0 or A1tst r0, #CACHELINESIZE - 1mcrnep15, 0, r0, c7, c10, 1@ Clean D cache linetst r1, #CACHELINESIZE - 1mcrnep15, 0, r1, c7, c10, 1@ Clean D cache linebic r0, r0, #CACHELINESIZE - 1@ round down to cache line1: mcr p15, 0, r0, c7, c6, 1@ Invalidate D cache lineadd r0, r0, #CACHELINESIZEcmp r0, r1blo 1bmov pc, lr/** cpu_xscale_dcache_clean_range(start, end)** For the specified virtual address range, ensure that all caches contain* clean data, such that peripheral accesses to the physical RAM fetch* correct data.** start: virtual start address* end: virtual end address*/.align5ENTRY(cpu_xscale_dcache_clean_range)bic r0, r0, #CACHELINESIZE - 1sub r2, r1, r0cmp r2, #MAX_AREA_SIZE82 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturemovhir2, #0bhi cpu_xscale_cache_clean_invalidate_all_r21: mcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEcmp r0, r1blo 1bmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/** cpu_xscale_clean_dcache_page(page)** Cleans a single page of dcache so that if we have any future aliased* mappings, they will be consistent at the time that they are created.** Note:* 1. we don't need to flush the write buffer in this case.* 2. we don't invalidate the entries since when we write the page* out to disk, the entries may get reloaded into the cache.*/.align5ENTRY(cpu_xscale_dcache_clean_page)mov r1, #PAGESIZE1: mcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c10, 1@ Clean D cache lineadd r0, r0, #CACHELINESIZEWhite Paper 83


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturesubsr1, r1, #4 * CACHELINESIZEbne 1bmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/** cpu_xscale_dcache_clean_entry(addr)** Clean the specified entry of any caches such that the MMU* translation fetches will obtain correct data.** addr: cache-unaligned virtual address*/.align5ENTRY(cpu_xscale_dcache_clean_entry)mcr p15, 0, r0, c7, c10, 1@ Clean D cache linemcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/* ================================ I-CACHE =============================== *//** cpu_xscale_icache_invalidate_range(start, end)** invalidate a range of virtual addresses from the Icache** start: virtual start address* end: virtual end address** Note: This is vaguely defined as supposed to bring the dcache and the* icache in sync by the way this function is used.*/.align5ENTRY(cpu_xscale_icache_invalidate_range)84 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturebic r0, r0, #CACHELINESIZE - 11: mcr p15, 0, r0, c7, c10, 1@ Clean D cache linemcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEcmp r0, r1blo 1bmcr p15, 0, ip, c7, c5, 6@ Invalidate BTBmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/** cpu_xscale_icache_invalidate_page(page)** invalidate all Icache lines associated with this area of memory** page: page to invalidate*/.align5ENTRY(cpu_xscale_icache_invalidate_page)mov r1, #PAGESIZE1: mcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEmcr p15, 0, r0, c7, c5, 1@ Invalidate I cache lineadd r0, r0, #CACHELINESIZEsubsr1, r1, #4 * CACHELINESIZEbne 1bmcr p15, 0, r0, c7, c5, 6@ Invalidate BTBmov pc, lr/* ================================ CACHE LOCKING============================White Paper 85


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture** The <strong>Intel</strong> ® XScale microarchitecture implements support for locking entries into* the data and instruction cache. The following functions implement the core* low level instructions needed to accomplish the locking. The developer's* manual states that the code that performs the locking must be in non-cached* memory. To accomplish this, the code in xscale-cache-lock.c copies the* following functions from the cache into a non-cached memory region that* is allocated through consistent_alloc().**/.align5/** xscale_icache_lock** r0: starting address to lock* r1: end address to lock*/ENTRY(xscale_icache_lock)iLockLoop:bic r0, r0, #CACHELINESIZE - 1mcr p15, 0, r0, c9, c1, 0@ lock into cachecmp r0, r1@ are we done?add r0, r0, #CACHELINESIZE@ advance to next cache linebls iLockLoopmov pc, lr/** xscale_icache_unlock*/ENTRY(xscale_icache_unlock)mcr p15, 0, r0, c9, c1, 1@ Unlock icachemov pc, lr86 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture/** xscale_dcache_lock** r0: starting address to lock* r1: end address to lock*/ENTRY(xscale_dcache_lock)mcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov r2, #1mcr p15, 0, r2, c9, c2, 0@ Put dcache in lock modecpwaitip@ Wait for completionmrs r2, cpsrorr r3, r2, #F_BIT | I_BITdLockLoop:msr cpsr_c, r3mcr p15, 0, r0, c7, c10, 1@ Write back line if it is dirtymcr p15, 0, r0, c7, c6, 1@ Flush/invalidate linemsr cpsr_c, r2ldr ip, [r0], #CACHELINESIZE @ Preload 32 bytes into cache from@ location [r0]. Post-increment@ r3 to next cache linecmp r0, r1@ Are we done?bls dLockLoopmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov r2, #0mcr p15, 0, r2, c9, c2, 0@ Get out of lock modecpwait_ret lr, ip/** xscale_dcache_unlock*/ENTRY(xscale_dcache_unlock)White Paper 87


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturemcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermcr p15, 0, ip, c9, c2, 1@ Unlock cachemov pc, lr/** Needed to determine the length of the code that needs to be copied.*/.align5ENTRY(xscale_cache_dummy)mov pc, lr/* ================================== TLB ================================= *//** cpu_xscale_tlb_invalidate_all()** Invalidate all TLB entries*/.align5ENTRY(cpu_xscale_tlb_invalidate_all)mcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermcr p15, 0, ip, c8, c7, 0@ invalidate I & D TLBscpwait_ret lr, ip/** cpu_xscale_tlb_invalidate_range(start, end)** invalidate TLB entries covering the specified range** start: range start address* end: range end address*/.align5ENTRY(cpu_xscale_tlb_invalidate_range)88 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturebic r0, r0, #(PAGESIZE - 1) & 0x00ffbic r0, r0, #(PAGESIZE - 1) & 0xff00mcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffer1: mcr p15, 0, r0, c8, c6, 1@ invalidate D TLB entrymcr p15, 0, r0, c8, c5, 1@ invalidate I TLB entryadd r0, r0, #PAGESIZEcmp r0, r1blo 1bcpwait_ret lr, ip/** cpu_xscale_tlb_invalidate_page(page, flags)** invalidate the TLB entries for the specified page.** page: page to invalidate* flags: non-zero if we include the I TLB*/.align5ENTRY(cpu_xscale_tlb_invalidate_page)mcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Bufferteq r1, #0mcr p15, 0, r0, c8, c6, 1@ invalidate D TLB entrymcrnep15, 0, r3, c8, c5, 1@ invalidate I TLB entrycpwait_ret lr, ip/* ================================ TLB LOCKING==============================** The <strong>Intel</strong> ® XScale microarchitecture implements support for locking entries into* the Instruction and Data TLBs. The following functions provide the* low level support for supporting these under Linux. xscale-lock.c* implements some higher level management code. Most of the following* is taken straight out of the Developer's Manual.*/White Paper 89


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture/** Lock I-TLB entry** r0: Virtual address to translate and lock*/.align5ENTRY(xscale_itlb_lock)mrs r2, cpsrorr r3, r2, #F_BIT | I_BITmsr cpsr_c, r3@ Disable interruptsmcr p15, 0, r0, c8, c5, 1@ Invalidate I-TLB entrymcr p15, 0, r0, c10, c4, 0@ Translate and lockmsr cpsr_c, r2@ Restore interruptscpwait_ret lr, ip/** Lock D-TLB entry** r0: Virtual address to translate and lock*/.align5ENTRY(xscale_dtlb_lock)mrs r2, cpsrorr r3, r2, #F_BIT | I_BITmsr cpsr_c, r3@ Disable interruptsmcr p15, 0, r0, c8, c6, 1@ Invalidate D-TLB entrymcr p15, 0, r0, c10, c8, 0@ Translate and lockmsr cpsr_c, r2@ Restore interruptscpwait_ret lr, ip/** Unlock all I-TLB entries*/90 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture.align5ENTRY(xscale_itlb_unlock)mcr p15, 0, ip, c10, c4, 1@ Unlock I-TLBmcr p15, 0, ip, c8, c5, 0@ Invalidate I-TLBcpwait_ret lr, ip/** Unlock all D-TLB entries*/ENTRY(xscale_dtlb_unlock)mcr p15, 0, ip, c10, c8, 1@ Unlock D-TBLmcr p15, 0, ip, c8, c6, 0@ Invalidate D-TLBcpwait_ret lr, ip/* =============================== Page Table ============================== */#define USER_CACHE_WRITE_ALLOCATE 1#define KERN_CACHE_WRITE_ALLOCATE 1#define PMD_TYPE_MASK0x0003#define PMD_TYPE_SECT0x0002#define PMD_SECT_BUFFERABLE0x0004#define PMD_SECT_CACHEABLE0x0008#define PMD_SECT_TEX_X0x1000#define HPTE_TYPE_SMALLEXT0x0003#define HPTE_SMALLEXT_TEX_X0x0040/** cpu_xscale_set_pgd(pgd)** Set the translation base pointer to be as described by pgd.** pgd: new page tablesWhite Paper 91


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture*/.align5ENTRY(cpu_xscale_set_pgd)clean_d_cache r1, r2mcr p15, 0, ip, c7, c5, 0@ Invalidate I cache & BTBmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermcr p15, 0, r0, c2, c0, 0@ load page table pointermcr p15, 0, ip, c8, c7, 0@ invalidate I & D TLBscpwait_ret lr, ip/** cpu_xscale_set_pmd(pmdp, pmd)** Set a level 1 translation table entry, and clean it out of* any caches such that the MMUs can load it correctly.** pmdp: pointer to PMD entry* pmd: PMD value to store*/.align5ENTRY(cpu_xscale_set_pmd)#if KERN_CACHE_WRITE_ALLOCATEand r2, r1, #PMD_TYPE_MASK|PMD_SECT_CACHEABLE|PMD_SECT_BUFFERABLEcmp r2, #PMD_TYPE_SECT|PMD_SECT_CACHEABLE|PMD_SECT_BUFFERABLEorreqr1, r1, #PMD_SECT_TEX_X#endifstr r1, [r0]mcr p15, 0, r0, c7, c10, 1@ Clean D cache linemcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr/** cpu_xscale_set_pte(ptep, pte)92 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture** Set a PTE and flush it out*/.align5ENTRY(cpu_xscale_set_pte)str r1, [r0], #-1024@ linux versionbic r2, r1, #0xff0bic r2, r2, #3eor r1, r1, #LPTE_PRESENT | LPTE_YOUNG | LPTE_WRITE | LPTE_DIRTY |LPTE_BUFFERABLE | LPTE_CACHEABLEtst r1, #LPTE_USER | LPTE_EXEC@ User or Exec?orrner2, r2, #HPTE_AP_READtst r1, #LPTE_WRITE | LPTE_DIRTY@ Write and Dirty?orreqr2, r2, #HPTE_AP_WRITE#if USER_CACHE_WRITE_ALLOCATEtst r1, #LPTE_CACHEABLE | LPTE_BUFFERABLE@ B and Corrner2, r2, #HPTE_TYPE_SMALLbiceqr2, r2, #0x0fc0@ clear non-exist AP[1-3]orreqr2, r2, #HPTE_TYPE_SMALLEXT | HPTE_SMALLEXT_TEX_X#elseorr r2, r2, #HPTE_TYPE_SMALL#endiftst r1, #LPTE_PRESENT | LPTE_YOUNG@ Present and Young?movner2, #0str r2, [r0]@ hardware versionmov r0, r0mcr p15, 0, r0, c7, c10, 1@ Clean D cache lineWhite Paper 93


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecturemcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermov pc, lr.ltorgcpu_manu_name:.asciz"<strong>Intel</strong>"cpu_80200_name:.asciz"XScale-80200"cpu_cotulla_name:.asciz"XScale-Cotulla".align.section ".text.init", #alloc, #execinstr__xscale_setup:mov r0, #F_BIT|I_BIT|SVC_MODEmsr cpsr_c, r0mcr p15, 0, ip, c7, c7, 0@ invalidate I, D caches & BTBmcr p15, 0, ip, c7, c10, 4@ Drain Write (& Fill) Buffermcr p15, 0, ip, c8, c7, 0@ invalidate I, D TLBsmcr p15, 0, r4, c2, c0, 0@ load page table pointermov r0, #0x1f@ Domains 0, 1 = clientmcr p15, 0, r0, c3, c0, 0@ load domain access registermrc p15, 0, r0, c1, c0, 0@ get control registerbic r0, r0, #0x0200@ ......R.........bic r0, r0, #0x0082@ ........B.....A.orr r0, r0, #0x0005@ .............C.Morr r0, r0, #0x3900@ ..VIZ..S........mov pc, lr94 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture.text/** Purpose : Function pointers used to access above functions - all calls* come through these*/.typexscale_processor_functions, #objectENTRY(xscale_processor_functions).word.word.word.word.word.wordcpu_xscale_data_abortcpu_xscale_check_bugscpu_xscale_proc_initcpu_xscale_proc_fincpu_xscale_resetcpu_xscale_do_idle/* cache */.word.word.wordcpu_xscale_cache_clean_invalidate_allcpu_xscale_cache_clean_invalidate_rangecpu_xscale_flush_ram_page/* dcache */.word.word.word.wordcpu_xscale_dcache_invalidate_rangecpu_xscale_dcache_clean_rangecpu_xscale_dcache_clean_pagecpu_xscale_dcache_clean_entry/* icache */.word.wordcpu_xscale_icache_invalidate_rangecpu_xscale_icache_invalidate_page/* tlb */.wordcpu_xscale_tlb_invalidate_allWhite Paper 95


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture.word.wordcpu_xscale_tlb_invalidate_rangecpu_xscale_tlb_invalidate_page/* pgtable */.word.word.word.sizecpu_xscale_set_pgdcpu_xscale_set_pmdcpu_xscale_set_ptexscale_processor_functions, . - xscale_processor_functions.typecpu_80200_info, #objectcpu_80200_info:.long.long.sizecpu_manu_namecpu_80200_namecpu_80200_info, . - cpu_80200_info.typecpu_cotulla_info, #objectcpu_cotulla_info:.long.long.sizecpu_manu_namecpu_cotulla_namecpu_cotulla_info, . - cpu_cotulla_info.typecpu_arch_name, #objectcpu_arch_name:.asciz.size"armv5"cpu_arch_name, . - cpu_arch_name.typecpu_elf_name, #objectcpu_elf_name:.asciz.size"v5"cpu_elf_name, . - cpu_elf_name.align.section ".proc.info", #alloc, #execinstr96 White Paper


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale Microarchitecture.type__80200_proc_info,#object__80200_proc_info:.long.long.longb.long.long.long.long.long.size0x690520000xfffffff00x00000c0e__xscale_setupcpu_arch_namecpu_elf_nameHWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSPcpu_80200_infoxscale_processor_functions__80200_proc_info, . - __80200_proc_info.type__cotulla_proc_info,#object__cotulla_proc_info:.long.long.longb.long.long.long.long.long.size0x690521000xfffffff00x00000c0e__xscale_setupcpu_arch_namecpu_elf_nameHWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSPcpu_cotulla_infoxscale_processor_functions__cotulla_proc_info, . - __cotulla_proc_infoWhite Paper 97


<strong>Intel</strong> ® <strong>80310</strong> I/O <strong>Processor</strong> <strong>Chipset</strong> <strong>AAU</strong> <strong>Coding</strong> <strong>Techniques</strong>MMU Functions for <strong>Intel</strong> ® XScale MicroarchitectureThis page intentionally left blank.98 White Paper

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!