NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ

More documents

Recommendations

Info

注册。消包初的返或 size, stream[i]); for (int i = 0; i < 2; ++i) { cuFuncSetBlockShape(cuFunction, 512, 1, 1); int offset = 0; cuParamSeti(cuFunction, offset, outputDevPtr); offset += sizeof(outputDevPtr); cuParamSeti(cuFunction, offset, inputDevPtr); offset += sizeof(inputDevPtr); cuParamSeti(cuFunction, offset, size); offset += sizeof(size); cuParamSetSize(cuFunction, offset); cuLaunchGridAsync(cuFunction, 100, 1, stream[i]); } for (int i = 0; i < 2; ++i) cuMemcpyDtoHAsync(hostPtr + i * size, outputDevPtr + i * size, size, stream[i]); cuEventRecord(stop, 0); cuEventSynchronize(stop); float elapsedTime; cuEventElapsedTime(&elapsedTime, start, stop); cuEventDestroy(start); cuEventDestroy(stop); 4.5.3.9 纹理参考管理 cuTexRefSetArray() 将 texture texRef; 的 CUtexref cuTexRef; cuModuleGetTexRef(&cuTexRef, cuModule, “texRef”); 指绑 cuTexRefSetAddress() 核使用纹理参考从纹理存储器中读取之前 , 必须使用内在理参考绑定到纹理。纹块 cuModule 定义如下的纹理参考 texRef: 含如果模 cuTexRefSetAddress(NULL, cuTexRef, devPtr, size); 数绑 texRef 面的代码示例将检索下则柄 : 句 cuTexRefSetArray(cuTexRef, cuArray, CU_TRSA_OVERRIDE_FORMAT); 4.5.3.10 OpenGL 互操作性 texRef 的代码示例将面下 devPtr 到定的线性存储器 : 向 texRef 的代码示例将面下 CUDA 到定组 cuArray: 考手册列举了用于设置寻址模式、过滤模式和其他针对纹理参考的标记的各种函数。在将纹理绑定到纹理参考时所指定的格式必须与声明纹理参考时指定的参数相匹配 ; 否则纹理拾取的结果将无法确定。参 GLuint bufferObj; cuGLRegisterBufferObject(bufferObj); GLuint bufferObj; CUdeviceptr devPtr; int size; cuGLMapBufferObject(&devPtr, &size, bufferObj); 成 : 完除映射是通解操作性。之互完的设备存储器地址读取或写入缓冲对象 : 回取 34 CUDA 编 cuGLInit() 使用须必 OpenGL 与化始先必须将一个缓冲对象注册到 CUDA, 首 cuGLRegisterBufferObject() 能进行映射。可通过才后 cuGLMapBufferObject() 完成后 , 内核即可使用册注 cuGLUnregisterBufferObject() , 可使用的成 2.0 南 , 版本指程过 cuGLUnmapBufferObject()
创填程指南 , 版上上将资设用为 4.5.3.11 Direct3D 互操作性 Direct3D 互 LPDIRECT3DVERTEXBUFFER9 buffer; cuD3D9RegisterResource(buffer, CU_D3D9_REGISTER_FLAGS_NONE); LPDIRECT3DSURFACE9 surface; cuD3D9RegisterResource(surface, CU_D3D9_REGISTER_FLAGS_NONE); cuD3D9RegisterResource() cuD3D9UnregisterVertexBuffer() 可可 CUDA 性要求在创建作操 Direct3D 时指定文下 cuD3D9CtxCreate() 通过使用。备 cuCtxCreate() 非而建 CUDA 下文即可实现此目标。。 cuD3D9UnmapResources() 有较高的开销 , 通常仅为每个资源调用一次。使取消注册。之和任意多次地映射和解除映射。内核可使用具能 cuD3D9ResourceGetMappedPointer() cuD3D9ResourceGetMappedSize()、cuD3D9ResourceGetMappedPitch() cuD3D9ResourceGetMappedPitchSlice() 返访及返 cuD3D9RegisterResource() 即可使用后随 Direct3D 注册到 CUDA: 源 0 CUdeviceptr devPtr; cuD3D9ResourceGetMappedPointer(&devPtr, buffer); size_t size; cuD3D9ResourceGetMappedSize(&size, buffer); cuMemset(devPtr, 0, size); 用 height) 的设备存储器地址和回的大小和间距信息来读取和写入已映射的资源。通问已映射的资源将导致不确定的结果。回面的代码示例使充了一个缓冲区 : 下 CUDA 源注册到资将 cuD3D9MapResources() 即可在需要时分别使用 , 后 // host code 代码示例中 , 每个线程都访问大小的面下在 CUdeviceptr devPtr; cuD3D9ResourceGetMappedPointer(&devPtr, surface); size_t pitch; cuD3D9ResourceGetMappedPitch(&pitch, surface); cuModuleGetFunction(&cuFunction, cuModule, “myKernel”); cuFuncSetBlockShape(cuFunction, 16, 16, 1); int offset = 0; cuParamSeti(cuFunction, offset, devPtr); offset += sizeof(devPtr); cuParamSeti(cuFunction, 0, width); offset += sizeof(width); cuParamSeti(cuFunction, 0, height); offset += sizeof(height); cuParamSeti(cuFunction, 0, pitch); offset += sizeof(pitch); cuParamSetSize(cuFunction, offset); cuLaunchGrid(cuFunction, (width+Db.x–1)/Db.x, (height+Db.y–1)/Db.y); // device code __global__ void myKernel(unsigned char* surface, int width, int height, size_t pitch) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if (x >= width || y >= height) return; float* pixel = (float*)(surface + y * pitch) + 4 * x; } 二维表面的一个像素 , 像素格式过 Direct3D 为 (width, float4: CUDA 编 35 本 2.0
Page 1 and 2: 程指南 , 版 NVIDIA CUDA 计
Page 3 and 4: 目录 1 2 3 第第 4 iii .......
Page 5 and 6: 程指南 , 版 5.3 5.4 5.5 6.1
Page 7 and 8: 核 (manycore) 众程指南 ,
Page 9 and 10: 渲程指南 , 版的第列
Page 11 and 12: 程指南 , 版变 (shared (intr
Page 13 and 14: 所 ,CUDA (host) 假 memory)。因
Page 15 and 16: 提 (compute 由 capability) 。
Page 17 and 18: 所块位 (constant (texture (tex
Page 19 and 20: 编的 ,C 标限 (host) (function
Page 21 and 22: 节变和助变 (implied (segme
Page 23 and 24: 工代语 ,__device__ 函对块
Page 25 and 26: 、2 分位是的节 (texture (
Page 27 and 28: 列数 (atomic ,atomicAdd() 将
Page 29 and 30: Direct3D 互。节和函节 ,D
Page 31 and 32: 的分个、cudaMallocPitch()
Page 33 and 34: 定类 ,cudaFilterModeLinear 是
Page 35 and 36: mode) emulation 是用 (printf() :
Page 37 and 38: 节函。cuCtxPopCurrent() 上 (u
Page 39: 。hostPtr copyParam.srcDevice = de
Page 43 and 44: 块器延迟。的 (if、switc
Page 45 and 46: warp , 当半块中的线程
Page 47 and 48: 存未为计程指南 , 版
Page 49 and 50: 字字存存 ,type 计 (Common
Page 51 and 52: 块和展个的位 ,warp 块
Page 53 and 54: 位的线性寻址。字程
Page 55 and 56: 的。个使用广播机制
Page 57 and 58: 节位数之节节或 (locali
Page 59 and 60: 相所选 ,Csub 等更高的
Page 61 and 62: ,Muld() 将的将相的程指
Page 63 and 64: 的节节节附录 A 技术
Page 65 and 66: (round-towards-zero), : (denormaliz
Page 67 and 68: 程指南 , 版内间外距
Page 69 and 70: 后后后后程指南 , 版
Page 71 and 72: 和映位位程指南 , 版
Page 73 and 74: 位程指南 , 版处位计
Page 75 and 76: ≤ 寻的 × 是 × 是节个
Page 77 and 78: 程指南 , 版使用线性

NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?

NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ