cuda中的预取(通过C代码) [英] Prefetch in cuda (through C code)

查看:229
本文介绍了cuda中的预取(通过C代码)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过C代码在CUDA(Fermi GPU)中进行数据预取。



任何人都可以通过cuda代码(cu文件)来预览一些文档或东西, 。任何帮助将不胜感激。

解决方案

根据的一个小示例:

  __ device__ int cube(int x)
{
int y;
asm({\\\
\t//对大括号使用大括号
.reg .u32 t1; \\\
\t// temp reg t1,
mul.lo.u32 t1,%1,%1; \\\
\t// t1 = x * x
mul.lo.u32%0,t1,%1; \\\
\\ \\ t// y = t1 * x
}
:= r(y):r(x)
return y;
}

您可以用C中的以下prefetch函数来结束:

  __ device__ void prefetch_l1(unsigned int addr)
{

asm(prefetch.global.L1 [%1];:= r(addr):r(addr));
}

注意:您需要Compute Capability 2.0或更高版本的GPU才能进行预取。传递正确的编译标志 -arch = sm_20


I am working on data prefetch in CUDA (Fermi GPU) through C code. Cuda reference manual talks about the prefetching at ptx level code not at C level code.

Can anyone connect me with some documents or something regarding prefetching through cuda code (cu file). Any help would be appreciated.

解决方案

According to PTX manual here is how prefetch works in PTX:

You can embed the PTX instructions into the CUDA kernel. Here is a tiny sample from NVIDIA's documentation:

__device__ int cube (int x)
{
  int y;
  asm("{\n\t"                       // use braces for local scope
      " .reg .u32 t1;\n\t"           // temp reg t1,
      " mul.lo.u32 t1, %1, %1;\n\t" // t1 = x * x
      " mul.lo.u32 %0, t1, %1;\n\t" // y = t1 * x
      "}"
      : "=r"(y) : "r" (x));
  return y;
}

You may come to conclude with the following prefetch function in C:

__device__ void prefetch_l1 (unsigned int addr)
{

  asm(" prefetch.global.L1 [ %1 ];": "=r"(addr) : "r"(addr));
}

NOTICE: You need the GPU of Compute Capability 2.0 or higher for prefetch. Pass proper compile flags accordingly -arch=sm_20

这篇关于cuda中的预取(通过C代码)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆