Theano:设备gpu初始化失败!原因= CNMEM_STATUS_OUT_OF_MEMORY [英] Theano: Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY

查看:229
本文介绍了Theano:设备gpu初始化失败!原因= CNMEM_STATUS_OUT_OF_MEMORY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Keras示例 kaggle_otto_nn.py,后端为theano.设置cnmem=1时,出现以下错误:

I am running the example kaggle_otto_nn.py of Keras with backend of theano. When I set cnmem=1, the following error comes out:

cliu@cliu-ubuntu:keras-examples$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=1 python kaggle_otto_nn.py
Using Theano backend.
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1

/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
  File "kaggle_otto_nn.py", line 28, in <module>
    from keras.models import Sequential
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 15, in <module>
  File "build/bdist.linux-x86_64/egg/keras/backend/__init__.py", line 46, in <module>
  File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/__init__.py", line 111, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/sandbox/cuda/tests/test_driver.py", line 38, in test_nvidia_driver1
    if not numpy.allclose(f(), a.sum()):
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.8.0rc1-py2.7.egg/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
RuntimeError: Cuda error: kernel_reduce_ccontig_node_97496c4d3cf9a06dc4082cc141f918d2_0: out of memory. (grid: 1 x 1; block: 256 x 1 x 1)

Apply node that caused the error: GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)
Toposort index: 0
Inputs types: [CudaNdarrayType(float32, vector)]
Inputs shapes: [(10000,)]
Inputs strides: [(1,)]
Inputs values: ['not shown']
Outputs clients: [[HostFromGpu(GpuCAReduce{add}{1}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

我似乎无法将cnmem设置为非常大的值(大约> 0.9),因为这可能会导致GPU的内存溢出.当我设置cnmem=0.9时,它可以正常工作.根据,它

It seems like I cannot set the cnmem to a very large value (about > 0.9) since it may cause the GPU's memory overflow. And when I set cnmem=0.9, it's is working correctly. According to this, it

表示内存池的起始大小(以MB或占GPU内存的百分比为单位).

represents the start size (in MB or % of total GPU memory) of the memory pool.

还有

这可能会导致内存碎片.因此,如果在使用cnmem时遇到内存错误,请尝试在开始时分配更多内存或将其禁用.如果尝试这样做,请在:ref theano-dev上报告结果.

但是,如果遇到内存错误,为什么要在开始时分配更多的内存?在我的情况下,在开始时分配更多的内存似乎是导致错误的原因.

But if I got memory error, why should I allocate more memory at the start? And in my case, allocating more memory at the start seems like causing the error.

推荐答案

这是根据.

此处所示,<实际上仅允许将c4>分配为float.

0:未启用.

0: not enabled.

0< N< == 1:使用总GPU内存的这一部分(驱动程序内存为0.95).

0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).

> 1:使用此数字(以兆字节(MB)的内存为单位).

> 1: use this number in megabytes (MB) of memory.

因此,如果使用cnmem=1.0而不是cnmem=1,它将可以正常工作.

So it will be working if cnmem=1.0 instead of cnmem=1.

这篇关于Theano:设备gpu初始化失败!原因= CNMEM_STATUS_OUT_OF_MEMORY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆