初学者CUDA - 简单的var增量不工作 [英] Beginner CUDA - Simple var increment not working

查看:109
本文介绍了初学者CUDA - 简单的var增量不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CUDA开发一个项目。要获得它的悬挂,我有以下代码。

I am working on a project with CUDA. To get the hang of it, I have the following code.

#include <iostream>

using namespace std;

__global__ void inc(int *foo) {
  ++(*foo);
}

int main() {
  int count = 0, *cuda_count;
  cudaMalloc((void**)&cuda_count, sizeof(int));
  cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice);
  cout << "count: " << count << '\n';
  inc <<< 100, 25 >>> (&count);
  cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost);
  cudaFree(cuda_count);
  cout << "count: " << count << '\n';
  return 0;
}

输出为

count: 0
count: 0

问题?

提前感谢!

推荐答案

解。我只是不得不使用一个原子函数,即一个函数执行没有其他线程的干扰。
换句话说,在操作完成
之前,没有其他线程可以访问特定的地址。

I found the solution. I just had to use an atomic function, i.e a function that is executed without interference from other threads. In other words, no other thread can access a specific address until the operation is complete.

#include <iostream>

using namespace std;

__global__ void inc(int *foo) {
  atomicAdd(foo, 1);
}

int main() {
  int count = 0, *cuda_count;
  cudaMalloc((void**)&cuda_count, sizeof(int));
  cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice);
  cout << "count: " << count << '\n';
  inc <<< 100, 25 >>> (cuda_count);
  cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost);
  cudaFree(cuda_count);
  cout << "count: " << count << '\n';
  return 0;
}

输出: b

Output:

count: 0
count: 2500


b $ b

感谢您让我意识到我提交的错误。

Thank you for making me realize the error that I was committing.

这篇关于初学者CUDA - 简单的var增量不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆