CUDA,如何选择<<< Blocks,Threads<>>? [英] CUDA, how to choose <<<Blocks, Threads<>>?
问题描述
在库中,我使用调用几个CUDA内核。当然,我想得到最好的性能。
In a library, I use calls to several CUDA kernels. Of course I want to get best performance. How users use the library can vary a bit.
块/线程的数量对此有很大的影响。
The number of Blocks / Threads influences this significantly.
有没有一些规则如何选择块/线程以获得最佳性能?
Is there some rule on how to chose Blocks / Threads for best performance?
例如(只是一个问题),最好选择块高,线程低?还是以其他方式?
或者最好使用GetDeviceProperties()?中的一些值。
For example (just a question), is it best to chose blocks high, threads low? Or the other way around? Or is it best to use some values from GetDeviceProperties()?
推荐答案
这是由NVIDIA提供的选择[您必须尝试更改线程和块在xls的值]最佳配置,您可以实现最佳占用,从而给你最好的性能。
you can use dependency calculator.xls, which is provided by NVIDIA for choosing[you have to try changing values of threads and blocks in xls] the best configuration, on which you can achieve best occupancy which in turn give you the best performance.
这篇关于CUDA,如何选择<<< Blocks,Threads<>>?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!