使用nvcc CUDA编译器时,分段错误的一些可能原因是什么? [英] What are some possible causes of a segmentation fault when using the nvcc CUDA compiler?

查看:1219
本文介绍了使用nvcc CUDA编译器时,分段错误的一些可能原因是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CUDA类,让我们调用 A ,在头文件中定义。我写了一个测试内核,它创建了一个类 A 的实例,它编译好并产生预期的结果。

I have a CUDA class, let's call it A, defined in a header file. I have written a test kernel which creates an instance of class A, which compiles fine and produces the expected result.

此外,我有我的主要CUDA内核,它也编译精细并产生预期的结果。然而,当我向我的主内核添加代码来实例化类 A 的实例时,nvcc编译器失败,出现分段错误。

In addition, I have my main CUDA kernel, which also compiles fine and produces the expected result. However, when I add code to my main kernel to instantiate an instance of class A, the nvcc compiler fails with a segmentation fault.

更新:

为了说明,分割错误发生在编译期间,而不是运行内核时。我用来编译的行是:

To clarify, the segmentation fault happens during compilation, not when running the kernel. The line I am using to compile is:

`nvcc --cubin -arch compute_20 -code sm_20 -I<My include dir> --keep kernel.cu`

其中< My include dir> 是我的本地路径包含一些实用工具头文件的路径。

where <My include dir> is the path to my local path containing some utility header files.

我的问题是,在花费大量的时间隔离一个最小的例子展示行为(不是微不足道,由于相对较大的代码库),有人遇到类似的问题吗?如果内核太长或使用太多寄存器,nvcc编译器是否可能会失败并死亡?

My question is, before spending a lot of time isolating a minimal example exhibiting the behaviour (not trivial, due to relatively large code base), has anyone encountered a similar issue? Would it be possible for the nvcc compiler to fail and die if the kernel is either too long or uses too many registers?

如果寄存器计数等问题会影响编译器这种方式,那么我将需要重新思考如何实现我的内核使用更少的资源。这也意味着,将事情调整到最小的例子可能会使问题消失。但是,如果这不是一个可能性,我不想浪费时间在一个死胡同,而是宁愿尝试把事情削减到一个最小的例子,将提交一个错误报告NVIDIA。

If an issue such as register count can affect the compiler this way, then I will need to rethink how to implement my kernel to use fewer resources. This would also mean that trimming things down to a minimal example will likely make the problem disappear. However, if this is not even a possibility, I don't want to waste time on a dead-end, but will rather try to cut things down to a minimal example and will file a bug report to NVIDIA.

更新:

根据@njuffa的建议,我将编译与 -v 标志已启用。输出结尾如下:

As per the suggestion of @njuffa, I reran the compilation with the -v flag enabled. The output ends with the following:

#$ ptxas  -arch=sm_20 -m64 -v  "/path/to/kernel_ptx/kernel.ptx"  -o "kernel.cubin" 
Segmentation fault
# --error 0x8b --

这表明问题是由于 ptxas 程序,无法从 ptx

This suggests the problem is due to the ptxas program, which is failing to generate a CUDA binary from the ptx file.

推荐答案

这看起来是CUDA 5.0 < c $ c> ptxas 汇编器。它被报告给NVIDIA,我们可以假设它是在问题提出后的三年多的时间内修复的,并且这个答案增加了。

This would appear to have been a genuine bug of some sort in the CUDA 5.0 ptxas assembler. It was reported to NVIDIA and we can assume that it was fixed sometime during the more than three years since the question was asked and this answer added.

[这个回答已经从评论集合并添加为社区wiki条目以从未回答的问题列表中获取此问题]

[This answer has been assembled from comments and added as a community wiki entry to get this question off the unanswered question list ]

这篇关于使用nvcc CUDA编译器时,分段错误的一些可能原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆