启动新纪元后,TensorFlow中的内存泄漏 [英] Memory leak in TensorFlow upon starting new epoch

查看:63
本文介绍了启动新纪元后,TensorFlow中的内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TensorFlow中的训练脚本对两种不同类型的图像进行分类.这是创建数据集对象的类,该对象用于生成批处理和增加时期.在第一个纪元完成之前,它可以正常工作.然后,它在next_batch方法内的行self._images = self._images[perm]处失败.这对我来说没有任何意义,因为Python不应复制self._images,而只是重新整理数据.

I'm working on a training script in TensorFlow to classify two different types of images. Here is the class that creates a data set object, which is used to generate batches and increment epochs. It works fine until the first epoch is completed. It then fails at the line self._images = self._images[perm] within the next_batch method. This doesn't make sense to me, since Python shouldn't be duplicating self._images--only reshuffling the data.

class DataSet(object):
  def __init__(self, images, labels, norm=True):
    assert images.shape[0] == labels.shape[0], (
      "images.shape: %s labels.shape: %s" % (images.shape,
                                         labels.shape))
    self._num_examples = images.shape[0]
    self._images = images
    self._labels = labels
    self._epochs_completed = 0
    self._index_in_epoch = 0
    self._norm = norm
    # Shuffle the data right away
    perm = np.arange(self._num_examples)
    np.random.shuffle(perm)
    self._images = self._images[perm]
    self._labels = self._labels[perm]
  @property
  def images(self):
    return self._images
  @property
  def labels(self):
    return self._labels
  @property
  def num_examples(self):
    return self._num_examples
  @property
  def epochs_completed(self):
    return self._epochs_completed
  def next_batch(self, batch_size):
    """Return the next `batch_size` examples from this data set."""
    start = self._index_in_epoch
    self._index_in_epoch += batch_size
    if self._index_in_epoch > self._num_examples:
      # Finished epoch
      self._epochs_completed += 1
      print("Completed epoch %d.\n"%self._epochs_completed)
      # Shuffle the data
      perm = np.arange(self._num_examples)
      np.random.shuffle(perm)
      self._images = self._images[perm] # this is where OOM happens
      self._labels = self._labels[perm]
      # Start next epoch

在常规训练周期中,内存使用不会增加.这是培训代码的一部分. data_train_normDataSet对象.

Memory usage does not increase during ordinary training cycles. Here is the portion of training code. data_train_norm is a DataSet object.

batch_size = 300
csv_plot = open("csvs/train_plot.csv","a")
for i in range(3000):
    batch = data_train_norm.next_batch(batch_size)
    if i%50 == 0:
            tce = cross_entropy.eval(feed_dict={x:batch[0],y_:batch[1],keep_prob:1.0},session=sess)
            print("\nstep %d, train ce %g"%(i,tce))
            print datetime.datetime.now()
            csv_plot.write("%d, %g\n"%(i,tce))

    train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.8},session=sess)

version = 1
saver.save(sess,'nets/cnn0nu_batch_gpu_roi_v%02d'%version)
csv_plot.close()

推荐答案

您是否正在使用dataset = dataset.shuffle(buffer_size)?

尝试减小buffer_size.这对我有用

这篇关于启动新纪元后,TensorFlow中的内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆