使用 nvcc CUDA 编译器时,有哪些可能导致分段错误的原因? [英] What are some possible causes of a segmentation fault when using the nvcc CUDA compiler?

查看:39
本文介绍了使用 nvcc CUDA 编译器时,有哪些可能导致分段错误的原因?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 CUDA 类,我们称它为 A,在头文件中定义.我已经编写了一个测试内核,它创建了一个类 A 的实例,它可以很好地编译并产生预期的结果.

I have a CUDA class, let's call it A, defined in a header file. I have written a test kernel which creates an instance of class A, which compiles fine and produces the expected result.

此外,我有我的主 CUDA 内核,它也可以很好地编译并产生预期的结果.但是,当我将代码添加到主内核以实例化类 A 的实例时,nvcc 编译器会因分段错误而失败.

In addition, I have my main CUDA kernel, which also compiles fine and produces the expected result. However, when I add code to my main kernel to instantiate an instance of class A, the nvcc compiler fails with a segmentation fault.

更新:

为了澄清,分段错误发生在编译期间,而不是在运行内核时.我用来编译的行是:

To clarify, the segmentation fault happens during compilation, not when running the kernel. The line I am using to compile is:

`nvcc --cubin -arch compute_20 -code sm_20 -I<My include dir> --keep kernel.cu`

其中 <My include dir> 是包含一些实用程序头文件的本地路径的路径.

where <My include dir> is the path to my local path containing some utility header files.

我的问题是,在花费大量时间隔离一个展示行为的最小示例之前(由于代码库相对较大,这不是微不足道的),有没有人遇到过类似的问题?如果内核太长或使用的寄存器太多,nvcc 编译器是否有可能失败并死掉?

My question is, before spending a lot of time isolating a minimal example exhibiting the behaviour (not trivial, due to relatively large code base), has anyone encountered a similar issue? Would it be possible for the nvcc compiler to fail and die if the kernel is either too long or uses too many registers?

如果诸如寄存器计数之类的问题会以这种方式影响编译器,那么我将需要重新考虑如何实现我的内核以使用更少的资源.这也意味着将事情精简到最小的例子可能会使问题消失.但是,如果这根本不可能,我不想在死胡同上浪费时间,而是会尝试将事情缩减到最小的示例,并向 NVIDIA 提交错误报告.

If an issue such as register count can affect the compiler this way, then I will need to rethink how to implement my kernel to use fewer resources. This would also mean that trimming things down to a minimal example will likely make the problem disappear. However, if this is not even a possibility, I don't want to waste time on a dead-end, but will rather try to cut things down to a minimal example and will file a bug report to NVIDIA.

更新:

根据@njuffa 的建议,我在启用 -v 标志的情况下重新运行编译.输出以以下内容结束:

As per the suggestion of @njuffa, I reran the compilation with the -v flag enabled. The output ends with the following:

#$ ptxas  -arch=sm_20 -m64 -v  "/path/to/kernel_ptx/kernel.ptx"  -o "kernel.cubin" 
Segmentation fault
# --error 0x8b --

这表明问题是由于 ptxas 程序无法从 ptx 文件生成 CUDA 二进制文件.

This suggests the problem is due to the ptxas program, which is failing to generate a CUDA binary from the ptx file.

推荐答案

这似乎是 CUDA 5.0 ptxas 汇编器中的某种真正的错误.该问题已报告给 NVIDIA,我们可以假设它是在提出问题并添加此答案后三年多的某个时间修复的.

This would appear to have been a genuine bug of some sort in the CUDA 5.0 ptxas assembler. It was reported to NVIDIA and we can assume that it was fixed sometime during the more than three years since the question was asked and this answer added.

[此答案已从评论中收集并添加为社区 wiki 条目,以将此问题从未回答的问题列表中删除]

[This answer has been assembled from comments and added as a community wiki entry to get this question off the unanswered question list ]

这篇关于使用 nvcc CUDA 编译器时,有哪些可能导致分段错误的原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆