什么错误:`加载运行时CuDNN库:5005但源编译与5103`是什么意思? [英] What does the error: `Loaded runtime CuDNN library: 5005 but source was compiled with 5103` mean?

查看:908
本文介绍了什么错误:`加载运行时CuDNN库:5005但源编译与5103`是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用TensorFlow与GPU,并得到以下错误:

  I tensorflow / core / common_runtime / gpu / gpu_device.cc:838]创建TensorFlow设备(/ gpu:0) - > (device:0,名称:Tesla K20m,pci bus id:0000:02:00.0)
E tensorflow / stream_executor / cuda / cuda_dnn.cc:347]加载运行时CuDNN库:5005用5103(兼容性版本5100)编译。如果使用二进制安装,请升级您的CuDNN库以匹配。如果从源代码构建,请确保运行时加载的库与编译配置期间指定的兼容版本匹配。
F tensorflow / core / kernels / conv_ops.cc:457]检查失败:stream-> parent() - > GetConvolveAlgorithms(& algorithms)

当然我试图修复这个错误(虽然这已经被问过了)但我想了解错误。我总是试图尝试解决(编码)问题自己在发布之前(请求帮助),但我有一个困难的时候,甚至开始这一个,因为错误消息似乎有点神秘/我不明白,我似乎找不到良好的资源了解错误的含义。



要了解错误,我专注于似乎是错误开始的地方:

 加载运行时CuDNN库:5005(兼容性版本5000),但源编译为5103(兼容性版本5100)。 

在阅读了一些似乎相关的github页面后,我意识到阅读以下的错误实际上更有帮助:


加载的运行时CuDNN库:5005,但源编译为5103.


删除括号使得错误更有意义(虽然我想理解/知道括号的作用是在错误消息中以便于调试),因为它似乎加载的CuDNN库5005(在UNIX / OS级别),但TensorFlow(为python)编译与我猜猜是版本5103.显然,如果TensorFlow库使用根据5103的API,但真正的API谈到(cuda)深度学习库CuDNN是版本5005,它的清楚它会是一个问题。虽然他们只是猜测的发生了什么。



我的第一个混乱是,据我所知,没有这样的事情CuDNN 5005或5103.它将是真棒了解什么那部分的错误意味着肯定,所以我可以开始试图调试这是真的。就我可以告诉当我使用模块列表我使用:

  cudnn / 5.0 

我的第二个混淆是我忽略的括号和它们的意思: p>


  1. 加载运行时CuDNN库:5005(兼容性版本5000)

  2. 但源编译为5103(兼容版本5100)

我真的不知道什么是兼容性版本XXXX的意思。也许它的建议安装版本5000(无论什么意思)CuDNN(这仍然令人困惑,因为没有一个5000版本的CuDNN)和编译版本的TensorFlow(不知何故),使用CuDNN版本5100。



有人知道更准确的错误是什么意思(并提供他们的解决方案的问题,我链接?)



cuDNN有主要版本,编号例如: 4.0,5.0,5.1等。



这些主要版本可能包含API更改。因此,使用cuDNN v4(即4.0)的程序可能需要一些修改才能使用或使用cuDNN v5(即5.0)中的新功能。



主要版本在4位版本号的前两位数字。因此,cuDNN 4位版本号5103意味着它属于5.1主要版本,并且子版本号为03.为了兼容性,这样的版本应该与任何其他API兼容cuDNN库版本的51xx,因为他们都属于5.1主要版本(这不能保证是严格真正的AFAIK,但它是一般的想法)。因此,任何版本号为51xx的这些库都将具有5100的兼容性版本,以表示它们属于5.1主要版本(并且(应该)兼容)。



所以当我们提到兼容性版本(这个库兼容的主要版本)时,我们只需要指定前两位数字 - 5000表示5.0,5100表示​​5.1。但是一个版本可能有一个非零的子版本号。



当一个程序(如tensorflow)被设计为使用cuDNN时,可能会有各种各样的原因,例如允许bug修复版本等。 ,它通常将被编码为与特定版本的cuDNN一起工作。在某些情况下,这可以在编译时通过编译一个特殊的cuDNN版本(和它的相关API,即当建立张量流时使用的头文件)来处理。因此,在编译时,像tensorflow这样的程序可以确定它编译的cuDNN API的什么版本,这是一个4位版本(虽然一般来说,只有兼容版本,即4位数字的前两位数字



在运行时,你有一个特定版本的cuDNN库(例如.so on linux)加载到你的机器上的某个地方。可以确定,查询和报告该库的版本。如果实际的库版本不匹配(至少从兼容版本的角度来看)convorflow编译的cuDNN库的版本,那么这是一个很好的指示,事情可能不工作,因此tensorflow指出这一点,当它运行:


加载的运行时CuDNN库:5005,但是源代码是使用5103编译的。


这是张力流告诉你嘿,我被设计(编译)与cuDNN v5.1工作,但你只给我cuDNN 5.0使用。



子版本级别上的差异应不大。如果你知道你在做什么,使用cuDNN运行时版本5107可能是确定的,即使你的张量流是编译版本5103.这只是一个假设的例子,但这将表明在库中有一些不同,这是不是意图改变适当的功能或行为,或API接口。



在理想的情况下,你将构建张量流,针对一个错误修正的版本的5103(假设这是一个想象的例子)您正在使用的cuDNN的版本。但是,如果您已经下载了预构建的张量流包,您可能会看到这种消息(因为您可能单独下载cuDNN)。在这种情况下,您应该至少尝试匹配您使用的cuDNN主要版本与tensorflow预期的兼容性版本。在这个特定的例子中,你不是这样做的。


I was trying to use TensorFlow with GPU and got the following error:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

of course I am trying to fix this error (though this has already been asked Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100)) but I'd like to understand the error. I always try to attempt solving (coding) problems myself before posting (asking for help) but I am having a hard time even starting this one because the error message seems a little cryptic/unclear to me and I can't seem to find a good resource to understand what the error means.

To understand the error I focused on the line that seems to be where the error starts:

Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5103 (compatibility version 5100).

After reading some github pages that seemed relevant I realized that reading the error as follows is actually more helpful:

Loaded runtime CuDNN library: 5005 but source was compiled with 5103.

removing the parenthesis makes the error make a bit more sense (though I'd like to understand/know what the role of the parenthesis is in the error message to easy the debugging) since it seems that it loaded CuDNN library 5005 (at the level of UNIX/OS) but the TensorFlow (for python) was compiled with what I would guess is version 5103. Obviously if the TensorFlow library is using an API according to 5103 but the "real" API to talk to the (cuda) deep learning library CuDNN is version 5005, its clear it would be a problem. Though they are just guesses of whats going on.

My first confusion is that as far as I can tell, there is no such thing CuDNN 5005 or 5103. It would be awesome to understand what that part of the error means for sure so that I can start trying to debug this for real. As far as I can tell when I use module list I am using:

cudnn/5.0

My second confusion is the parenthesis that I ignored and what they mean:

  1. Loaded runtime CuDNN library: 5005 (compatibility version 5000)
  2. but source was compiled with 5103 (compatibility version 5100)

I honestly have no idea idea what the "compatibility version XXXX" means. Maybe its suggestion to install version 5000 (whatever that means) for CuDNN (which is still confusing because there isn't a 5 thousand version of CuDNN) and compile a version of TensorFlow (somehow) that uses CuDNN version 5100.

Does someone know more precisely what the errors mean exactly (and make provide their solution to the question I linked?)

解决方案

This is an approximate description of what is going on.

cuDNN has major releases that are numbered e.g. 4.0, 5.0, 5.1, etc.

These major releases may incorporate API changes. Therefore a program that uses cuDNN v4 (i.e. 4.0) may need some modifications to work with or use new features in cuDNN v5 (i.e. 5.0).

The major release is encoded in the first two digits of the 4-digit version number. So a cuDNN 4-digit version number of 5103 means it belongs to the 5.1 major release and has a sub-version number of 03. For compatibility purposes, such a release should be API-compatible with any other cuDNN library version of 51xx because they all belong to the 5.1 major release (this is not guaranteed to be strictly true AFAIK, but it is the general idea). Therefore any of these libraries with release numbering 51xx would have a compatibility version of 5100, to indicate that they belong to (and are (should be) compatible with) the 5.1 major release.

So when we are referring to a compatibility version (what major release is this library compatible with) we only need to specify the first two digits - 5000 indicates 5.0, 5100 indicates 5.1. But it is possible for a release to have a sub-release version number that is non-zero. There could be a variety of reasons for this, for example to allow for bug-fix releases and the like.

When a program (like tensorflow) is designed to use cuDNN, it will generally be coded to work with a particular version of cuDNN. In some cases, this can be handled at compile time, by "compiling against" a pariticular cuDNN version (and it's associated API, i.e. header files used when building tensorflow). Therefore, at compile time, a program like tensorflow can determine what version of the cuDNN API it was compiled against, and that is a 4-digit version (although generally speaking, only the compatibililty version i.e. the first two digits of the 4-digit version should really matter).

At runtime, you have a particular version of the cuDNN library (e.g. .so on linux) loaded on your machine somewhere. The version of that library can be determined, queried, and reported. If that actual library version does not match (at least from a compatibility version perspective) the version of the cuDNN library that tensorflow was compiled against, then that's a good indication that things may not work, and so tensorflow points this out when it is running:

Loaded runtime CuDNN library: 5005 but source was compiled with 5103.

This is tensorflow telling you "hey, I was designed (compiled) to work with cuDNN v5.1 but you are only giving me cuDNN 5.0 to work with".

Differences at the sub-version level should be less significant. If you know what you are doing, it may be ok to use cuDNN runtime version 5107 even if your tensorflow was compiled against version 5103. This is just a hypothetical example, but that would indicate that there is some difference in the library which was not intended to change proper functionality or behavior, or the API interface. It could be just a bug-fixed version of 5103, for example (hypothetically. This is an imaginary example.)

In the ideal case, you would build tensorflow against the version of cuDNN that you are using. If you have downloaded pre-built tensorflow packages, however, then you may witness this sort of message (since you presumably downloaded cuDNN separately). In that case, you should at least seek to match the cuDNN major version you are using against the compatibility version that tensorflow is expecting. In this particular example, you are not doing that.

这篇关于什么错误:`加载运行时CuDNN库:5005但源编译与5103`是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆