iterator.get_next() 导致在抛出 'std::system_error 实例后调用终止 [英] iterator.get_next() cause terminate called after throwing an instance of 'std::system_error

查看:46
本文介绍了iterator.get_next() 导致在抛出 'std::system_error 实例后调用终止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用具有以下属性的共享服务器使用 tensorflow 训练 resNet50:

I am training a resNet50 with tensorflow, using a shared server with these properties:

Ubuntu 16.043 gtx 1080 gpu张量流 1.3蟒蛇 2.7但总是在两个时代之后,在第三个时代,我遇到这个错误:

ubuntu 16.04 3 gtx 1080 gpus tensorflow 1.3 python 2.7 but always after two epochs, and during the third epoch, I encounter this error:

terminate called after throwing an instance of 'std::system_error' 
what():
Resource temporarily unavailable
Aborted 

这是将 tfrecord 转换为数据集的代码:

this is code convert tfrecord to dataset:

filenames = ["balanced_t.tfrecords"]
dataset = tf.contrib.data.TFRecordDataset(filenames)
def parser(record):
keys_to_features = {
    "mhot_label_raw": tf.FixedLenFeature((), tf.string, 
default_value=""),
    "mel_spec_raw": tf.FixedLenFeature((), tf.string, 
default_value=""),
}
parsed = tf.parse_single_example(record, keys_to_features)

mel_spec1d = tf.decode_raw(parsed['mel_spec_raw'], tf.float64)
# label = tf.cast(parsed["label"], tf.string)
mhot_label = tf.decode_raw(parsed['mhot_label_raw'], tf.float64)
mel_spec = tf.reshape(mel_spec1d, [96, 64])
return {"mel_data": mel_spec}, mhot_label
dataset = dataset.map(parser)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()

这是输入管道:

while True:
        try:
           (features, labels) = sess.run(iterator.get_next())
        except tf.errors.OutOfRangeError:
           print("end of training dataset")

在我的代码中插入一些打印消息后,我发现下面这行导致了这个错误:

After inserting some print message in my code,Ihave discovered that the below line cause this error:

(features, labels) = sess.run(iterator.get_next())

但是,我无法解决它

推荐答案

您的代码存在(微妙的)内存泄漏,因此该进程可能内存不足并被终止.问题是在每次循环迭代中调用 iterator.get_next() 会向 TensorFlow 图中添加一个新节点,最终会消耗大量内存.

Your code has a (subtle) memory leak, so it's possible that the process is running out of memory and being terminated. The issue is that calling iterator.get_next() in each loop iteration will add a new node to the TensorFlow graph, which will end up consuming a lot of memory.

要阻止内存泄漏,请将您的 while 循环重写为以下内容:

To stop the memory leak, rewrite your while loop as the following:

# Call `get_next()` once outside the loop to create the TensorFlow operations once.
next_element = iterator.get_next()

while True:
    try:
        (features, labels) = sess.run(next_element)
    except tf.errors.OutOfRangeError:
        print("end of training dataset")

这篇关于iterator.get_next() 导致在抛出 'std::system_error 实例后调用终止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆