初学者CUDA - 简单的var增量不工作 [英] Beginner CUDA - Simple var increment not working
本文介绍了初学者CUDA - 简单的var增量不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用CUDA开发一个项目。要获得它的悬挂,我有以下代码。
I am working on a project with CUDA. To get the hang of it, I have the following code.
#include <iostream>
using namespace std;
__global__ void inc(int *foo) {
++(*foo);
}
int main() {
int count = 0, *cuda_count;
cudaMalloc((void**)&cuda_count, sizeof(int));
cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice);
cout << "count: " << count << '\n';
inc <<< 100, 25 >>> (&count);
cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(cuda_count);
cout << "count: " << count << '\n';
return 0;
}
输出为
count: 0
count: 0
问题?
提前感谢!
推荐答案
解。我只是不得不使用一个原子函数,即一个函数执行没有其他线程的干扰。
换句话说,在操作完成
之前,没有其他线程可以访问特定的地址。
I found the solution. I just had to use an atomic function, i.e a function that is executed without interference from other threads. In other words, no other thread can access a specific address until the operation is complete.
#include <iostream>
using namespace std;
__global__ void inc(int *foo) {
atomicAdd(foo, 1);
}
int main() {
int count = 0, *cuda_count;
cudaMalloc((void**)&cuda_count, sizeof(int));
cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice);
cout << "count: " << count << '\n';
inc <<< 100, 25 >>> (cuda_count);
cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(cuda_count);
cout << "count: " << count << '\n';
return 0;
}
输出: b
Output:
count: 0
count: 2500
b $ b
感谢您让我意识到我提交的错误。
Thank you for making me realize the error that I was committing.
这篇关于初学者CUDA - 简单的var增量不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文