Google Colab Pro 在分配大内存时崩溃 [英] Google Colab Pro crashed while allocating large memory

查看：672 发布时间：2021/6/13 19:30:38 tensorflow out-of-memory google-colaboratory

本文介绍了Google Colab Pro 在分配大内存时崩溃的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Colab pro GPU(最大 25Gb 内存)来训练顺序模型.根据找到的说明此处，我将内存限制设置为 22Gb.下面是我的代码和日志.

I'm trying to use Colab pro GPU (max 25Gb memory) for training a sequential model. Based on the instructions found here, I'm setting the memory limit to 22Gb. Below is my code and logs.

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
mem_limit=22000

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=mem_limit)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

根据这个日志，它似乎正在设置上限

Dec 22, 2020, 7:57:15 PM    WARNING 2020-12-23 01:57:15.673093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22000 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)

Dec 22, 2020, 7:57:15 PM    WARNING 2020-12-23 01:57:15.673030: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.

但是，在执行语句时，它总是试图分配 37Gb 内存并且运行时崩溃.这是日志

However, when executing a statement, invariably it's attempting to allocate 37Gb memory and the runtime crashes. Here is the log

Dec 22, 2020, 8:01:01 PM    INFO    KernelRestarter: restarting kernel (1/5), keep random ports

Dec 22, 2020, 8:00:47 PM    WARNING tcmalloc: large alloc 37200994304 bytes == 0x7f48b828a000 @ 0x7f5249f5a001 0x7f52414564ff 0x7f52414a6ab8 0x7f52414aabb7 0x7f5241549003 0x50a4a5 0x50cc96 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x5161c5 0x50a12f 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x508ec2 0x594a01 0x59fd0e 0x50d256 0x507be4 0x509900 0x50a2fd

我的数据集很大，可能需要超过 128Gb 的内存.有没有办法限制 TF 使用的内存量，如果涉及到这个，我可以接受更长的执行时间.

My dataset is large and will possibly require more than 128Gb memory. Is there way to limit the amount of memory use by TF and I'm fine with longer execution time, if it comes to that.

提前致谢.

Google Colab Pro 在分配大内存时崩溃 [英] Google Colab Pro crashed while allocating large memory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Google Colab Pro 在分配大内存时崩溃 [英] Google Colab Pro crashed while allocating large memory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭