多GPU使用与CUDA Thrust [英] Multi GPU usage with CUDA Thrust

查看:2114
本文介绍了多GPU使用与CUDA Thrust的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用我的两张图形卡来计算CUDA Thrust。



我有两张显卡。在单个卡上运行对于两个卡都很好,即使我在std :: vector中存储了两个device_vectors。



如果我同时使用两个卡,循环的第一个循环工作,不会导致错误。第一次运行后会导致错误,可能是因为设备指针无效。



我不知道确切的问题是什么,或者如何使用这两个卡进行计算。



最小代码示例:

  std :: vector& :device_vector< float> > TEST(){
std :: vector< thrust :: device_vector< float> > vRes;

unsigned int iDeviceCount = GetCudaDeviceCount();
for(unsigned int i = 0; i< iDeviceCount; i ++){
checkCudaErrors(cudaSetDevice(i));
thrust :: host_vector< float> hvConscience(1024);

//第一次运行工作,然后运行导致错误..
vRes.push_back(hvConscience); // this push_back cause the error on exec

}
return vRes;
}

执行时出现错误讯息:




  terminate抛出一个'thrust :: system :: system_error'的实例
what():无效参数


解决方案

这里的问题是,你试图执行一个设备到设备的复制数据之间的一对 device_vector ,它们驻留在不同的GPU上下文中(因为 cudaSetDevice 调用)。你可能忽略的是这个操作序列:

  thrust :: host_vector< float> hvConscience(1024); 
vRes.push_back(hvConscience);

正在从 hvConscience 执行 code>在每次循环迭代。推力后端期望源和目的地存储器位于相同的GPU上下文中。在这种情况下,他们不会,因此错误。



你可能想做的是使用一个指针 $ c> device_vector ,所以类似:

  typedef thrust :: device_vector< float> vec; 
typedef vec * p_vec;
std :: vector< p_vec> vRes;

unsigned int iDeviceCount = GetCudaDeviceCount();
for(unsigned int i = 0; i< iDeviceCount; i ++){
cudaSetDevice(i);
p_vec hvConscience = new vec(1024);
vRes.push_back(hvConscience);
}

[免责声明:在浏览器中编写的代码,风险]



这样,您只需在正确的GPU上下文中创建一个向量一次,然后复制分配一个主机指针,不会触发任何设备端副本跨越存储空间。


I want to use my two graphic cards for calculation with CUDA Thrust.

I have two graphic cards. Running on single cards works well for both cards, even when I store two device_vectors in the std::vector.

If I use both cards at the same time, the first cycle in the loop works and causes no error. After the first run it causes an error, probably because the device pointer is not valid.

I am not sure what the exact problem is, or how to use both cards for calculation.

Minimal code sample:

std::vector<thrust::device_vector<float> > TEST() {
    std::vector<thrust::device_vector<float> > vRes;

    unsigned int iDeviceCount   = GetCudaDeviceCount();
    for(unsigned int i = 0; i < iDeviceCount; i++) {
        checkCudaErrors(cudaSetDevice(i) ); 
        thrust::host_vector<float> hvConscience(1024);

                // first run works, runs afterwards cause errors ..
        vRes.push_back(hvConscience); // this push_back causes the error on exec

    }
    return vRes;
}

Error message on execution:

terminate called after throwing an instance of 'thrust::system::system_error'
what():  invalid argument

解决方案

The problem here is that you are trying to perform a device to device of copy data between a pair of device_vector which reside in different GPU contexts (because of the cudaSetDevice call). What you have perhaps overlooked is that this sequence of operations:

thrust::host_vector<float> hvConscience(1024);
vRes.push_back(hvConscience);

is performing a copy from hvConscience at each loop iteration. The thrust backend is expecting that source and destination memory lie in the same GPU context. In this case they do not, thus the error.

What you probably want to do is work with a vector of pointers to device_vector instead, so something like:

typedef thrust::device_vector< float > vec;
typedef vec *p_vec;
std::vector< p_vec > vRes;

unsigned int iDeviceCount   = GetCudaDeviceCount();
for(unsigned int i = 0; i < iDeviceCount; i++) {
    cudaSetDevice(i); 
    p_vec hvConscience = new vec(1024);
    vRes.push_back(hvConscience);
}

[disclaimer: code written in browser, neither compiled nor tested, us at own risk]

This way you are only creating each vector once, in the correct GPU context, and then copy assigning a host pointer, which doesn't trigger any device side copies across memory spaces.

这篇关于多GPU使用与CUDA Thrust的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆