我想把我的代码从CPP更改为CUDA,任何想法? [英] I want change my code from CPP to CUDA, any idea?

查看:264
本文介绍了我想把我的代码从CPP更改为CUDA,任何想法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我无法解决。

I have a problem that i can't solve.

问题如下。

const int dataSize = 65535;
const int category = 10;
float data[dataSize][category];
const float threshold = 0.5f;

int cnt = 0;

// data array contains any values

for(int i=0;i<dataSize;i++)
{
    if( data[i][9] > threshold )
    {
        data[cnt][0] = data[i][0];
        data[cnt][1] = data[i][1];
        data[cnt][2] = data[i][2];
        data[cnt][3] = data[i][3];
        data[cnt][4] = data[i][4];
        data[cnt][5] = data[i][5];
        data[cnt][6] = data[i][6];
        data[cnt][7] = data[i][7];
        data[cnt][8] = data[i][8];
        data[cnt][9] = data[i][9];
        cnt++;
    }
}



使用此代码,我希望'data'元素是通过阈值收集的(没有超过阈值的元素对我来说不重要,重要的是只是超过阈值。)

By using this code, I expect 'data' array's element is collected over threshold value.(The element that is not over the threshold is not important to me. Important thing is just over threshold value.)

我想代码。

所以我试着这样做。

__global__ void checkOverThreshold(float *data, float threshold, int *nCount)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;

    if( data[idx*10+9] > threshold )
    {
        data[nCount+0] = data[idx*10+0];
        data[nCount+1] = data[idx*10+1];
        data[nCount+2] = data[idx*10+2];
        data[nCount+3] = data[idx*10+3];
        data[nCount+4] = data[idx*10+4];
        data[nCount+5] = data[idx*10+5];
        data[nCount+6] = data[idx*10+6];
        data[nCount+7] = data[idx*10+7];
        data[nCount+8] = data[idx*10+8];
        data[nCount+9] = data[idx*10+9];
        atomicAdd( nCount, 1);
    }
}

....

// kernel function call
checkOverThreshold<<< dataSize / 128, 128 >>>(d_data, treshold, d_count);

但是CUDA代码的结果不是我预期的结果。

But the result of CUDA code is not that I expected.

它包含大量的垃圾值,甚至结果与CPP不一样。

It contains lots of trash value and even the result is not the same as CPP's.

我认为nCount变量的同步问题会导致这种情况。

I think that the nCount variable's synchronization problem makes this situation.

但我不知道要解决这个问题。

But, I have no idea to solve this problem.

请帮助我的代码。提前感谢。

Please help my code. Thank you in advance.

推荐答案

此代码已损坏:

    data[nCount+0] = data[idx*10+0];
    data[nCount+1] = data[idx*10+1];
    data[nCount+2] = data[idx*10+2];
    data[nCount+3] = data[idx*10+3];
    data[nCount+4] = data[idx*10+4];
    data[nCount+5] = data[idx*10+5];
    data[nCount+6] = data[idx*10+6];
    data[nCount+7] = data[idx*10+7];
    data[nCount+8] = data[idx*10+8];
    data[nCount+9] = data[idx*10+9];
    atomicAdd( nCount, 1);

如果 nCount ,会产生废话。应为

If nCount is modified during all those assignments, nonsense will result. It should be

    int d = atomicAdd(nCount, 1);
    data[d+0] = data[idx*10+0];
    data[d+1] = data[idx*10+1];
    data[d+2] = data[idx*10+2];
    data[d+3] = data[idx*10+3];
    data[d+4] = data[idx*10+4];
    data[d+5] = data[idx*10+5];
    data[d+6] = data[idx*10+6];
    data[d+7] = data[idx*10+7];
    data[d+8] = data[idx*10+8];
    data[d+9] = data[idx*10+9];

这篇关于我想把我的代码从CPP更改为CUDA,任何想法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆