似乎达到了 CUDA 限制，但那是什么限制? [英] CUDA limit seems to be reached, but what limit is that?

查看：34 发布时间：2022/1/10 15:12:36 gpgpu cuda

本文介绍了似乎达到了 CUDA 限制，但那是什么限制?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 CUDA 程序，它似乎达到了某种资源的某种限制，但我不知道该资源是什么.这是核函数:

I have a CUDA program that seems to be hitting some sort of limit of some resource, but I can't figure out what that resource is. Here is the kernel function:

__global__ void DoCheck(float2* points, int* segmentToPolylineIndexMap, 
                        int segmentCount, int* output)
{
    int segmentIndex = threadIdx.x + blockIdx.x * blockDim.x;
    int pointCount = segmentCount + 1;

    if(segmentIndex >= segmentCount)
        return;

    int polylineIndex = segmentToPolylineIndexMap[segmentIndex];
    int result = 0;
    if(polylineIndex >= 0)
    {
        float2 p1 = points[segmentIndex];
        float2 p2 = points[segmentIndex+1];
        float2 A = p2;
        float2 a;
        a.x = p2.x - p1.x;
        a.y = p2.y - p1.y;

        for(int i = segmentIndex+2; i < segmentCount; i++)
        {
            int currentPolylineIndex = segmentToPolylineIndexMap[i];

            // if not a different segment within out polyline and
            // not a fake segment
            bool isLegit = (currentPolylineIndex != polylineIndex && 
                currentPolylineIndex >= 0);      

            float2 p3 = points[i];
            float2 p4 = points[i+1];
            float2 B = p4;
            float2 b;
            b.x = p4.x - p3.x;
            b.y = p4.y - p3.y;

            float2 c;
            c.x = B.x - A.x;
            c.y = B.y - A.y;

            float2 b_perp;
            b_perp.x = -b.y;
            b_perp.y = b.x;

            float numerator = dot(b_perp, c);
            float denominator = dot(b_perp, a);
            bool isParallel = (denominator == 0.0);

            float quotient = numerator / denominator;
            float2 intersectionPoint;
            intersectionPoint.x = quotient * a.x + A.x;
            intersectionPoint.y = quotient * a.y + A.y;

            result = result | (isLegit && !isParallel && 
                intersectionPoint.x > min(p1.x, p2.x) && 
                intersectionPoint.x > min(p3.x, p4.x) && 
                intersectionPoint.x < max(p1.x, p2.x) && 
                intersectionPoint.x < max(p3.x, p4.x) && 
                intersectionPoint.y > min(p1.y, p2.y) && 
                intersectionPoint.y > min(p3.y, p4.y) && 
                intersectionPoint.y < max(p1.y, p2.y) && 
                intersectionPoint.y < max(p3.y, p4.y));
        }
    }

    output[segmentIndex] = result;
}

这里是执行内核函数的调用:

Here is the call to execute the kernel function:

DoCheck<<<702, 32>>>(
    (float2*)devicePoints, 
    deviceSegmentsToPolylineIndexMap, 
    numSegments, 
    deviceOutput);

参数大小如下:

devicePoints = 22,464 float2s = 179,712 字节
deviceSegmentsToPolylineIndexMap = 22,463 个整数 = 89,852 个字节
numSegments = 1 个整数 = 4 个字节
deviceOutput = 22,463 个整数 = 89,852 个字节

当我执行这个内核时，它会导致显卡崩溃.看来我遇到了某种限制，因为如果我使用 DoCheck<<<300, 32>>>(...); 执行内核，它可以工作.明确一点，参数是一样的，只是块数不一样.

When I execute this kernel, it crashes the video card. It would appear that I am hitting some sort of limit, because if I execute the kernel using DoCheck<<<300, 32>>>(...);, it works. Just to be clear, the parameters are the same, just the number of blocks is different.

知道为什么一个会导致视频驱动程序崩溃，而另一个不会吗?失败的似乎仍在卡片的块数限制内.

Any idea why one crashes the video driver, and the other doesn't? The one that fail seems to be still within the card's limit on number of blocks.

更新有关我的系统配置的更多信息:

Update More information on my system configuration:

显卡:nVidia 8800GT
CUDA 版本:1.1
操作系统:Windows Server 2008 R2

我也在一台笔记本电脑上试了一下，配置如下，结果一样:

I also tried it on a laptop with the following configuration, but got the same results:

显卡:nVidia Quadro FX 880M
CUDA 版本:1.2
操作系统:Windows 7 64 位

似乎达到了 CUDA 限制，但那是什么限制? [英] CUDA limit seems to be reached, but what limit is that?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

似乎达到了 CUDA 限制，但那是什么限制? [英] CUDA limit seems to be reached, but what limit is that?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭