TensorFlow:Dst 张量未初始化 [英] TensorFlow: Dst tensor is not initialized

查看:52
本文介绍了TensorFlow:Dst 张量未初始化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MNIST For ML Beginners 教程在我运行 print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) .其他一切都运行良好.

The MNIST For ML Beginners tutorial is giving me an error when I run print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})). Everything else runs fine.

错误和跟踪:

InternalErrorTraceback (most recent call last)
<ipython-input-16-219711f7d235> in <module>()
----> 1 print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
    657       # pylint: disable=protected-access
    658       raise errors._make_specific_exception(node_def, op, error_message,
--> 659                                             e.code)
    660       # pylint: enable=protected-access
    661 

InternalError: Dst tensor is not initialized.
     [[Node: _recv_Placeholder_3_0/_1007 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_312__recv_Placeholder_3_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
     [[Node: Mean_1/_1011 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_319_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

我刚刚切换到更新版本的 CUDA,所以这可能与此有关吗?似乎这个错误是关于将张量复制到 GPU.

I just switched to a more recent version of CUDA, so maybe this has something to do with that? Seems like this error is about copying a tensor to the GPU.

堆栈:EC2 g2.8xlarge 机器,Ubuntu 14.04

Stack: EC2 g2.8xlarge machine, Ubuntu 14.04

更新:

print(sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys})) 运行良好.这让我怀疑问题在于我试图将一个巨大的张量传输到 GPU 并且它无法接受.像 minibatch 这样的小张量工作得很好.

print(sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys})) runs fine. This leads me to suspect that the issue is that I'm trying to transfer a huge tensor to the GPU and it can't take it. Small tensors like a minibatch work just fine.

更新 2:

我已经弄清楚了导致这个问题的张量到底有多大:

I've figured out exactly how big the tensors have to be to cause this issue:

batch_size = 7509 #Works.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))

batch_size = 7510 #Doesn't work. Gets the Dst error.
print(sess.run(accuracy, feed_dict={x: mnist.test.images[0:batch_size], y_: mnist.test.labels[0:batch_size]}))

推荐答案

为简洁起见,当没有足够的内存来处理批量大小时会生成此错误消息.

For brevity, this error message is generated when there is not enough memory to handle the batch size.

扩展 Steven 的链接(我还不能发表评论),这里有一些监控/控制的技巧Tensorflow 中的内存使用情况:

Expanding on Steven's link (I cannot post comments yet), here are a few tricks to monitor/control memory usage in Tensorflow:

  • 要在运行期间监控内存使用情况,请考虑记录运行元数据.然后,您可以在 Tensorboard 的图表中查看每个节点的内存使用情况.有关更多信息,请参阅 Tensorboard 信息页面信息和示例.
  • 默认情况下,Tensorflow 会尝试分配尽可能多的 GPU 内存.您可以使用 GPUConfig 选项更改此设置,以便 Tensorflow 仅根据需要分配尽可能多的内存.请参阅文档在这一点上.在那里,您还可以找到一个选项,该选项允许您仅分配 GPU 内存的一部分(不过我发现有时会损坏.
  • To monitor memory usage during runs, consider logging run metadata. You can then see the memory usage per node in your graph in Tensorboard. See the Tensorboard information page for more information and an example of this.
  • By default, Tensorflow will try to allocate as much GPU memory as possible. You can change this using the GPUConfig options, so that Tensorflow will only allocate as much memory as needed. See the documentation on this. There you also find an option that will allow you to only allocate a certain fraction of your GPU memory (I have found this to be broken sometimes though.).

这篇关于TensorFlow:Dst 张量未初始化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆