cublas内核功能会自动与主机同步吗? [英] Will the cublas kernel functions automatically be synchronized with the host?
问题描述
这只是关于方舟的一般问题。对于单线程,如果没有从GPU到CPU的内存传输(例如cublasGetVector),cublas内核功能(例如cublasDgemm)是否会自动与主机同步?
Just a general question about cublas. For a single thread, if there is not memory transfer from GPU to CPU (e.g. cublasGetVector), will the cublas kernel functions (eg cublasDgemm) automatically be synchronized with the host?
cublasDgemm();
//cublasGetVector();
host_functions()
此外,两次相邻的内核调用之间又如何?
Furthermore, what about between two adjacent kernel calls?
cublasDgemm();
cublasDgemm();
又如何处理不涉及先前内核中使用的全局内存的同步传输呢? / p>
and, what about a synchronized transfer that does not involve the global memory used in the previous kernel?
cublasDgemm(...gA...gB...gC);
cublasGetVector(...gD...D...);
推荐答案
不,CUBLAS API除了一些1级例程,它们返回标量值,并且是异步的。
No, the CUBLAS API is, with the exception of a few Level 1 routines which return a scalar value, asynchronous.
第3级例程,例如 cublasDgemm
不会阻塞主机,您需要调用阻塞API例程,例如同步内存传输或显式的主机GPU同步调用,以确保CUBLAS调用已完成。
Level 3 routines like cublasDgemm
don't block the host, you need to call a blocking API routine like a synchronous memory transfer or an explicit host-GPU synchronisation call to ensure that the CUBLAS call has completed.
这篇关于cublas内核功能会自动与主机同步吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!