Tensorflow OMP:训练时错误#15 [英] Tensorflow OMP: Error #15 when training

查看:1668
本文介绍了Tensorflow OMP:训练时错误#15的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在CentOS HPC上使用tensorflow训练我的神经网络.但是,在培训过程开始时出现了此错误:

I am training my neural network using tensorflow on CentOS HPC. However I got this error at start of the training process:

OMP:错误#15:正在初始化libiomp5.so,但是发现libiomp5.so已被初始化. OMP:提示:这意味着OpenMP运行时的多个副本已链接到程序中.这很危险,因为它会降低性能或导致错误的结果.最好的办法是确保仅将单个OpenMP运行时链接到该流程中,例如通过避免在任何库中静态链接OpenMP运行时.作为不安全,不受支持,未记录的解决方法,您可以设置环境变量KMP_DUPLICATE_LIB_OK = TRUE以允许程序继续执行,但是可能会导致崩溃或静默地产生不正确的结果.有关更多信息,请参见> http://www.intel.com/software/products/support /.

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

例如,该代码是细分,对许多人来说都可以正常工作,但是在我的情况下失败了.

The code is for instance segmentation and it worked fine for many people, but failed in my case.

为什么会发生?该怎么解决?

Why it occurs? How to solve it?

推荐答案

我通过咨询HPC服务器专家来解决此问题.可能对Compute Canada系统用户有用.

I solved this problem by asking a HPC server expert. Maybe useful for Compute Canada system users.

为什么会发生?

此错误是由于tensorflow预先构建的Python轮子(特定于Compute Canada系统)与conda环境之间的冲突所致. Quote:"conda总是有点问题,因为它会下载预编译的二进制文件,里程可能会有所不同..."

This error is due to conflict between a tensorflow pre-built Python wheel(which is specific for Compute Canada system) and conda environment. Quote : "conda is always a bit problematic because it downloads precompiled binaries, mileage may vary..."

如何解决?

@abccd指出最好的做法是确保仅将单个OpenMP运行时链接到该进程中".但是,我还没有弄清楚如何确保这一点.

As @abccd pointed out "The best thing to do is to ensure that only a single OpenMP runtime is linked into the process". However, I haven't figured out how to ensure that.

因此,我卸载了conda,并使用pip install将所有内容安装在模块系统中.这样网络就可以正常工作了.

So I uninstalled conda, and install everything in module system using pip install. Then the network works fine.

这篇关于Tensorflow OMP:训练时错误#15的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆