如何从多个CPU线程管理相同的CUDA内核调用? [英] How to manage same CUDA kernel call from multiple CPU threads?

查看:1278
本文介绍了如何从多个CPU线程管理相同的CUDA内核调用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个cuda内核,当从单个CPU线程调用时工作正常。然而,当同样是从多个CPU线程(〜100)调用,大多数内核似乎没有执行,因为结果是出来的全零。有人请指导我如何解决这个问题?

I have a cuda kernel which works fine when called from a single CPU threads. However when the same is called from multiple CPU threads (~100), most of the kernel seems not be executed at all as the results comes out to be all zeros.Can someone please guide me how to resolve this problem?

在当前版本的内核中,我使用 cudadevicesynchronize()在内核调用结束时。将在 cudaMalloc()之前添加同步命令和内核调用在这种情况下有什么帮助?

In the current version of kernel I am using a cudadevicesynchronize() at the end of kernel call. Will adding a sync command before cudaMalloc() and kernel call be of any help in this case?

还有一件事需要澄清。如果两个CPU线程执行相同的cudaMalloc()命令,以后会覆盖以前在GPU内存还是会创建自己的内存?

There is another thing which need some clarification. i.e. If two CPU threads executes the same cudaMalloc() command, will the later overwrite the former in GPU memory or will they create their own memory?

help

推荐答案

通常一个CPU线程可以用于调用CUDA内核。但是,由于CUDA 4.0,多个CPU线程可以共享上下文。您可以使用 cuCtxSetCurrent 将内核的上下文绑定到当前线程。有关此API函数的详情,请访问这里

Usually one CPU thread can be used for calling a CUDA kernel. However, since CUDA 4.0, multiple CPU threads can share context. You can use cuCtxSetCurrent to tie the context of the kernel to the current thread. More information about this API function can be found here.

另一个解决方法是创建一个GPU工作线程来保存上下文,并将任何CUDA请求传递给该线程。

Another workaround for this is to create a GPU worker thread that holds the context and pass any CUDA request to that thread.

关于你的另一个问题,没有设置正确的线程的上下文,我记得cudaMalloc甚至不会执行(我使用JCuda,所以行为可能有点不同)。但是如果上下文当前设置为调用内核,那么内存不会被覆盖。

Regarding your other question, without setting the context for the proper thread, I remember that cudaMalloc would not even execute (I work with JCuda so the behavior may be a little different). But if the context is currently set to the calling kernel, the memories will not be overwritten.

这篇关于如何从多个CPU线程管理相同的CUDA内核调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆