练习计算CUDA的网格大小 [英] Practice computing grid size for CUDA

查看：119 发布时间：2020/10/13 0:50:06 cuda nvidia

本文介绍了练习计算CUDA的网格大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

dim3 block(4, 2)
dim3 grid((nx+block.x-1)/block.x, (ny.block.y-1)/block.y);

我在第53页的Professional CUDA C编程中找到了此代码。这只是一个简单的例子矩阵乘法。 nx 是列数， ny 是行数。

I found this code in Professional CUDA C Programming on page 53. It's meant to be a naive example of matrix multiplication. nx is the number of columns and ny is the number of rows.

您能解释一下如何计算网格大小吗？为什么将 block.x 添加到 nx 然后减去 1 ？

Can you explain how the grid size is computed? Why is block.x added to nx and then subtracted by 1?

有一个预览（ https://books.google.com/books?id=_Z7rnAEACAAJ&printsec=frontcover#v=onepage&q&f=false ）但页面缺少53。

There is a preview (https://books.google.com/books?id=_Z7rnAEACAAJ&printsec=frontcover#v=onepage&q&f=false) but page 53 is missing.

推荐答案

这是用于确定每个维度中最小块数的标准CUDA习惯用法（网格），以完全覆盖所需的输入。可以表示为 ceil（nx / block.x），即找出需要多少块才能覆盖所需的大小，然后四舍五入。

This is the standard CUDA idiom for determining the minimum number of blocks in each dimension (the "grid") that completely cover the desired input. This could be expressed as ceil(nx/block.x), that is, figure out how many blocks are needed to cover the desired size, then round up.

但是完整的浮点除法和ceil比必要的代价昂贵。相反，由于C将整数除法定义为底数运算，因此可以在除数之前添加除数-1，以获得天花板运算的效果。

But full floating point division and ceil is more expensive than necessary. Instead, since C defines integer division as a "floor" operation, you can add the divisor - 1 before dividing to the get the effect of a "ceiling" operation.

尝试一些示例：如果 nx = 10 ，则 nx + block.x-1 为13，并且是整数divison，您需要3个大小为4的块。

Try a few examples: If nx = 10, then nx + block.x - 1 is 13, and by integer divison, you need 3 blocks of size 4.

正如您在注释中所指出的，+ block.x将楼层推高到天花板，而-1表示相除的数字完美地放入除数例如当我们实际想要（12 + 4-1）/ 4时，（12 + 4）/ 4将为4

As you noted in the comment, +block.x pushes up floor to ceiling and the -1 is for numbers that divide perfectly into the divisor. e.g. (12 + 4)/4 would be 4 when we actually want (12+4-1)/4 which 3

这篇关于练习计算CUDA的网格大小的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

练习计算CUDA的网格大小 [英] Practice computing grid size for CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

练习计算CUDA的网格大小 [英] Practice computing grid size for CUDA

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭