12.07.2015 Views

NVIDIA CUDA

NVIDIA CUDA

NVIDIA CUDA

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.25 Memory Management 1353.25.2.7 CUresult cuMemAllocHost (void ∗∗ pp, unsigned int bytesize)Allocates bytesize bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtualmemory ranges allocated with this function and automatically accelerates calls to functions such as cuMemcpy().Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth thanpageable memory obtained with functions such as malloc(). Allocating excessive amounts of memory with cuMemAllocHost()may degrade system performance, since it reduces the amount of memory available to the system for paging.As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.Parameters:pp - Returned host pointer to page-locked memorybytesize - Requested allocation size in bytesReturns:Note:<strong>CUDA</strong>_SUCCESS, <strong>CUDA</strong>_ERROR_DEINITIALIZED, <strong>CUDA</strong>_ERROR_NOT_INITIALIZED, <strong>CUDA</strong>_-ERROR_INVALID_CONTEXT, <strong>CUDA</strong>_ERROR_INVALID_VALUE, <strong>CUDA</strong>_ERROR_OUT_OF_MEMORYSee also:Note that this function may also return error codes from previous, asynchronous launches.cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc,cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync,cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA,cuMemcpyDtoD, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync,cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGet-Info, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32,cuMemsetD8, cuMemsetD16, cuMemsetD323.25.2.8 CUresult cuMemAllocPitch (CUdeviceptr ∗ dptr, unsigned int ∗ pPitch, unsigned int WidthInBytes,unsigned int Height, unsigned int ElementSizeBytes)Allocates at least WidthInBytes ∗ Height bytes of linear memory on the device and returns in ∗dptr a pointerto the allocated memory. The function may pad the allocation to ensure that corresponding pointers in any givenrow will continue to meet the alignment requirements for coalescing as the address is updated from row to row.ElementSizeBytes specifies the size of the largest reads and writes that will be performed on the memory range.ElementSizeBytes may be 4, 8 or 16 (since coalesced memory transactions are not possible on other data sizes). IfElementSizeBytes is smaller than the actual read/write size of a kernel, the kernel will run correctly, but possiblyat reduced speed. The pitch returned in ∗pPitch by cuMemAllocPitch() is the width in bytes of the allocation. Theintended usage of pitch is as a separate parameter of the allocation, used to compute addresses within the 2D array.Given the row and column of an array element of type T, the address is computed as:T* pElement = (T*)((char*)BaseAddress + Row * Pitch) + Column;The pitch returned by cuMemAllocPitch() is guaranteed to work with cuMemcpy2D() under all circumstances. Forallocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cuMemAllocPitch().Due to alignment restrictions in the hardware, this is especially true if the application will be performing2D memory copies between different regions of device memory (whether linear memory or <strong>CUDA</strong> arrays).Generated on Wed Apr 1 16:11:42 2009 for <strong>NVIDIA</strong> <strong>CUDA</strong> Library by Doxygen

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!