NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ

More documents

Recommendations

Info

B.2.1 单精度浮点函数 __fadd_rn() __fmul_rn() FMAD 和映射为加法和法作且译不把们入。此相比 ,“*” 和 “+” 运算符生成的加法和乘法一般都将被并入 FMAD。 _fdividef(x, y) 2 126 < y < 2 128 ,__fdivedef(x, y) B-1 2 普通浮点除法和具有相同精度 , 但在时为 0, 126 < y < 2 128 x __ fdivedf(x, y) 0 而普通除法将在表列举的精确度内提供正确。同样 , 在时 , 如果是无穷大 , 操编器会 __[u]mul24(x, y) x y 24 32 。x y 8 他并中与 NaN( __[u]mulhi(x, y) x y 64 32 的结果 ), 而普通除法将返回无穷大。 __[u]mul64hi(x, y) 64 x y 128 个最低有效的乘积 , 提供个最低有效位的结果和的个最高有效位将被忽略。将 64 得 __saturate(x) 到 x 结 0 果 x 无 1 __[u]sad(x, 穷 y, z 大 x y 以 , 提供位结果中的个最高有效位计算位整型参数和的乘积 , 提供位结果中的个最高有效位。将在小于时 0, 在大于时返回 1, 否则返回 x。计 __clz(x) x 31 0 0 32 0 算整型参数和的乘积型参数 z)( __clzll(x) 64 x 63 0 0 64 整型参数与的差绝对值之和。 0 从最高有效位 ( 即第位 ) 开始的连续位的数量 , 介于和之间 ( 包 __ffs(x) x x 括和 32)。和 0,_ffs() 绝 Linux 对差之和 ) 返回整位 ( 即第 ) 开始的连续的数量 , 介于和之间 ( 包括和 64)。整型参数中第一个 ( 最低有效 ) 位组的位置。最低有效位是位置 1。如果为将返回 0。请注意 , 此函数等同于函数 ffs。 __ffsll(x) x x 0,__ffsll() Linux 从高一个 ( 最低有效 ) 位组的位置。最低有效位是位置 1。如果为将返回 0。请注意 , 此函数等同于函数 ffsll。 B-3 CUDA __fadd_[rn,rz](x,y) IEEE __fmul_[rn,rz](x,y) IEEE __fdividef(x,y) 表运行时库支持的单精度浮点建函数及其误差范围 y [2 返回位整型参数中的第误差范围符合 -126 , 2 126 ] ulp 2. __expf(x) ulp 2 + floor(abs(1.16 * 区间内 , 则最大误差为 __exp10f(x) ulp 2 + floor(abs(2.95 * 函数为 x))。 __logf(x) x [0.5, 2] 2 -21.41, ulp __log2f(x) x [0.5, 2] 2 ulp 则最大最大误差 -22, __log10f(x) x [0.5, 2] 2 ulp 2。否 -24, __sinf(x) x [-π, π] 2 否则最大误差为 3。 -21.41 __cosf(x) x [-π, π] 2 -21.19 __sincosf(x,sptr,cptr) sinf(x) cosf(x) __tanf(x) __sinf(x) * (1 / 如果在区间内 , 则最大绝对误差为 , 否则更大。相同。误差为 3。 __powf(x, y) exp2f(y * 下实现 : __log2f(x))。 __mul24(x,y) N/A __umul24(x,y) __cosf(x))。 __mulhi(x,y) N/A 以自承继和 __umulhi(x,y) 64 CUDA 2.0 本版 , 南指程编与
和映位位程指南 , 版的的的的位数。算。与此相比 ,“*” 运围 __int_as_float(x) N/A __float_as_int(x) N/A __saturate(x) N/A __sad(x,y,z) __usad(x,y,z) N/A __clz(x) N/A __ffs(x) N/A __float2int_[rn,rz,ru,rd] N/A __float2uint_[rn,rz,ru,rd] N/A __int2float_[rn,rz,ru,rd] N/A __uint2float_[rn,rz,ru,rd] N/A B.2.2 双精度浮点函数 _dadd_rn() CUDA 运 __dadd_[rn,rz,ru,rd](x,y) __dmul_[rn,rz,ru,rd](x,y) __fma_[rn,rz,ru,rd](x,y,z) __double2float_[rn,rz](x) __double2int_[rn,rz,ru,rd](x) __double2uint_[rn,rz,ru,rd](x) __double2ll_[rn,rz,ru,rd](x) __double2ull_[rn,rz,ru,rd](x) __int2double_rn(x) __uint2double_rn(x) __ll2double_[rn,rz,ru,rd](x) __ull2double_[rn,rz,ru,rd](x) __double_as_longlong(x) __longlong_as_double(x) __double2hiint(x) __double2loint(x) __hiloint2double(x, ys) 函数运行时库支持的双精度浮点 IEEE-compliant. IEEE-compliant. N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A IEEE 时库支持的双精度浮点内建围函数及其误差范误差范围符合行运 _dmul_rn() FMAD 加法和乘法运算 , 且编译器不会把他们并入为射 “+” 运算符生成的加法和乘法一般都将被并和入 FMAD。表 B-4. B.2.3 整型函数 _popc(x) 返 _popcll(x) 返位数整型参二进制表示中设置数 x 为 1 CUDA 编 65 64 在回 32 在回 x 参数型整 1 制表示中设置为进二本 2.0
Page 1 and 2:
程指南 , 版 NVIDIA CUDA 计
Page 3 and 4:
目录 1 2 3 第第 4 iii .......
Page 5 and 6:
程指南 , 版 5.3 5.4 5.5 6.1
Page 7 and 8:
核 (manycore) 众程指南 ,
Page 9 and 10:
渲程指南 , 版的第列
Page 11 and 12:
程指南 , 版变 (shared (intr
Page 13 and 14:
所 ,CUDA (host) 假 memory)。因
Page 15 and 16:
提 (compute 由 capability) 。
Page 17 and 18:
所块位 (constant (texture (tex
Page 19 and 20: 编的 ,C 标限 (host) (function
Page 21 and 22: 节变和助变 (implied (segme
Page 23 and 24: 工代语 ,__device__ 函对块
Page 25 and 26: 、2 分位是的节 (texture (
Page 27 and 28: 列数 (atomic ,atomicAdd() 将
Page 29 and 30: Direct3D 互。节和函节 ,D
Page 31 and 32: 的分个、cudaMallocPitch()
Page 33 and 34: 定类 ,cudaFilterModeLinear 是
Page 35 and 36: mode) emulation 是用 (printf() :
Page 37 and 38: 节函。cuCtxPopCurrent() 上 (u
Page 39 and 40: 。hostPtr copyParam.srcDevice = de
Page 41 and 42: 创填程指南 , 版上上
Page 43 and 44: 块器延迟。的 (if、switc
Page 45 and 46: warp , 当半块中的线程
Page 47 and 48: 存未为计程指南 , 版
Page 49 and 50: 字字存存 ,type 计 (Common
Page 51 and 52: 块和展个的位 ,warp 块
Page 53 and 54: 位的线性寻址。字程
Page 55 and 56: 的。个使用广播机制
Page 57 and 58: 节位数之节节或 (locali
Page 59 and 60: 相所选 ,Csub 等更高的
Page 61 and 62: ,Muld() 将的将相的程指
Page 63 and 64: 的节节节附录 A 技术
Page 65 and 66: (round-towards-zero), : (denormaliz
Page 67 and 68: 程指南 , 版内间外距
Page 69: 后后后后程指南 , 版
Page 73 and 74: 位程指南 , 版处位计
Page 75 and 76: ≤ 寻的 × 是 × 是节个
Page 77 and 78: 程指南 , 版使用线性
show all

NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?

NVIDIA CUDA è®¡ç®ç»ä¸è®¾å¤æ¶æ