摘要直方图中的 Nan [英] Nan in summary histogram

查看：16 发布时间：2021/12/27 17:05:47 python tensorflow deep-learning gradient

本文介绍了摘要直方图中的 Nan的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的程序有时会遇到这个问题(不是每次运行都会遇到这个..)，然后如果遇到这个问题，我总是可以从我在程序崩溃之前保存的最后一个模型中重现这个错误加载由于 nan.当从这个模型重新运行时，使用该模型生成损失的第一次训练过程似乎很好(我已经打印了损失并显示没有问题)，但是在应用梯度后，嵌入变量的值将变为 Nan.

My program will face this some times(not every run will face this..), then if face this I can always reproduce this error loading from the last model I have saved before program crash due to nan. When rerun from this model, first train process seems fine using the model to generate loss(I have printed loss and shows no problem), but after applying gradients, the values of embedding variables will turn to Nan.

那么nan问题的根本原因是什么?困惑不知道如何进一步调试，这个具有相同数据和参数的程序大部分都可以正常运行，只有在某些运行时才会遇到这个问题..

So what is the root cause of the nan problem? Confused as not know how to debug further and this program with same data and params will mostly run ok and only face this problem during some run..

Loading existing model from: /home/gezi/temp/image-caption//model.flickr.rnn2.nan/model.ckpt-18000
Train from restored model: /home/gezi/temp/image-caption//model.flickr.rnn2.nan/model.ckpt-18000
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 5235 get requests, put_count=4729 evicted_count=1000 eviction_rate=0.211461 and unsatisfied allocation rate=0.306781
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 100 to 110
2016-10-04 21:45:39 epoch:1.87 train_step:18001 duration:0.947 elapsed:0.947 train_avg_metrics:['loss:0.527']  ['loss:0.527']
2016-10-04 21:45:39 epoch:1.87 eval_step: 18001 duration:0.001 elapsed:0.948 ratio:0.001
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: Nan in summary histogram for: rnn/HistogramSummary_1
     [[Node: rnn/HistogramSummary_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/HistogramSummary_1/tag, rnn/image_text_sim/image_mlp/w_h/read/_309)]]
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: Nan in summary histogram for: rnn/HistogramSummary_1
     [[Node: rnn/HistogramSummary_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/HistogramSummary_1/tag, rnn/image_text_sim/image_mlp/w_h/read/_309)]]
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: Nan in summary histogram for: rnn/HistogramSummary_1
     [[Node: rnn/HistogramSummary_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/HistogramSummary_1/tag, rnn/image_text_sim/image_mlp/w_h/read/_309)]]
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: Nan in summary histogram for: rnn/HistogramSummary_1
     [[Node: rnn/HistogramSummary_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/HistogramSummary_1/tag, rnn/image_text_sim/image_mlp/w_h/read/_309)]]
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: Nan in summary histogram for: rnn/HistogramSummary_1
     [[Node: rnn/HistogramSummary_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/HistogramSummary_1/tag, rnn/image_text_sim/image_mlp/w_h/read/_309)]]
Traceback (most recent call last):
  File "./train.py", line 308, in <module>
    tf.app.run()

摘要直方图中的 Nan [英] Nan in summary histogram

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

摘要直方图中的 Nan [英] Nan in summary histogram

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭