Cuda编译的例子 [英] Cuda compilation of examples

查看:145
本文介绍了Cuda编译的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Windows 7 Pro 32上有NVIDIA GeForce 8500 GT,我在CUDAC的项目有问题。我已经安装了所有的软件包和VS2012 Pro。我正在为模板创建新的项目Cuda 6.5 ...编译它和..无效的设备功能
从Geting开始Windows PDF我已经阅读,我可以chricck CUDA by deviceQuery.exe ..所以我这样做:

I have NVIDIA GeForce 8500 GT on Windows 7 Pro 32, and I have problem with my project in CUDAC. I have installed all packages and VS2012 Pro. I'm creating new Project from template for Cuda 6.5... Compile it and.. "invalid device function". From Geting Started Windows PDF I have read that i can chceck CUDA by deviceQuery.exe.. So i done this :

deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 8500 GT"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 512 MBytes (536870912 bytes)
  ( 2) Multiprocessors, (  8) CUDA Cores/MP:     16 CUDA Cores
  GPU Clock rate:                                1570 MHz (1.57 GHz)
  Memory Clock rate:                             400 Mhz
  Memory Bus Width:                              128-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 8500 GT
Result = PASS

这么简单!什么是错误?下一步我完成bandwidthTest

So PASS!!! SO WHATS WRONG..? Next i done bandwidthTest

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce 8500 GT
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         1346.5

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         1556.9

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         5857.4

Result = PASS

因此,enybode可以帮助我?

So can enybode help my?

推荐答案

无效的设备功能通常意味着代码是使用高于您尝试运行的GPU的架构编译的

Invalid device function usually means that the code was compiled with an architecture that is higher than the GPU you are trying to run it on.

GPU架构包含在您的打印输出中:

The GPU architecture is contained in your printout:

  CUDA Capability Major/Minor version number:    1.1

CUDA 6.5默认为cc2.0架构编译。如果你想编译为cc 1.1架构,你需要通过特定的开关到你的 nvcc 编译命令。

CUDA 6.5 compiles for a cc2.0 architecture by default. If you want to compile for a cc 1.1 architecture, you will need to pass specific switches to your nvcc compile command to do so.

这通常意味着在项目属性的Visual Studio设备配置选项卡中添加 compute_11,sm_11

This usually means adding something like compute_11,sm_11 in the Visual Studio device configuration tab on your project properties.

这样做时,您将收到警告(根据CUDA 6.5),设备体系结构1.1已被弃用。然而,你仍然可以编译和目标这个架构。

When you do so, you will then get warnings (under CUDA 6.5) that device architecture 1.1 is deprecated. However you can still compile for and target this architecture.

即使这个问题涉及到windows,同样的必要性存在于Linux。如果你在linux上使用CUDA 6.5,默认编译目标是cc2.0。要编译更早的设备,需要在编译命令行中添加一些 -arch = sm_11

And even though this question pertains to windows, the same necessity exists on Linux. If you use CUDA 6.5 on linux, the default compile target is cc2.0. To compile for an earlier device, it's necessary to add something to the compile command line like -arch=sm_11.

这篇关于Cuda编译的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆