tensorflow 服务在预测时返回 NaN [英] tensorflow serving returning NaN when predict

查看:53
本文介绍了tensorflow 服务在预测时返回 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经训练了一个 GAN 模型并通过以下函数保存了生成器:

I had trained one GAN model and saved the generator by the following function:

    tf.keras.models.save_model(
        generator,
        filepath=os.path.join(MODEL_PATH, 'model_saver'),
        overwrite=True,
        include_optimizer=False,
        save_format=None,
        options=None
    )

它在python中通过tf.keras.models.load_model加载模型时成功预测.但是在 tensorflow 模型服务器中为模型提供服务时,模型返回 NaN 值.我通过以下方式为模型提供服务:

It predicts successfully when load model by tf.keras.models.load_model in python. But when serving the model in tensorflow model server, the model returns NaN value. I serve the model by the following:

zhaocc:~/products/tensorflow_server$ sudo docker run -t --rm -p 8502:8501     -v "/tmp/pix2pix/sketch_photo/model_saver:/models/photo2sketch"     -e MODEL_NAME=photo2sketch     tensorflow/serving &
[3] 30089
zhaocc:~/products/tensorflow_server$ 2020-06-17 12:57:31.745339: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config:  model_name: photo2sketch model_base_path: /models/photo2sketch
2020-06-17 12:57:31.745448: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-06-17 12:57:31.745459: I tensorflow_serving/model_servers/server_core.cc:575]  (Re-)adding model: photo2sketch
2020-06-17 12:57:31.846162: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: photo2sketch version: 1}
2020-06-17 12:57:31.846213: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846233: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: photo2sketch version: 1}
2020-06-17 12:57:31.846282: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/photo2sketch/1
2020-06-17 12:57:31.874158: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-17 12:57:31.874182: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/photo2sketch/1
2020-06-17 12:57:31.874315: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-17 12:57:31.952982: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-17 12:57:32.172641: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/photo2sketch/1
2020-06-17 12:57:32.248514: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 402236 microseconds.
2020-06-17 12:57:32.256576: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/photo2sketch/1/assets.extra/tf_serving_warmup_requests
2020-06-17 12:57:32.265064: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: photo2sketch version: 1}
2020-06-17 12:57:32.267113: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-06-17 12:57:32.269289: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

当我通过 REST 请求进行预测时,它返回具有正确形状的 NaN:

When I predict by REST request, it return NaN with correct shape:

[[[[nan nan nan]
   [nan nan nan]
   [nan nan nan]
   ...
   [nan nan nan]
   [nan nan nan]
   [nan nan nan]]

有人知道为什么吗?我该如何调试它?非常感谢!

Anybody knows why? How can I debug it? Thanks very much!

推荐答案

我的 Pix2Pix 生成器也遇到了同样的问题.问题出在训练参数上.如此处所述什么是`training=True`是指在调用 TensorFlow Keras 模型时吗? 这个参数会影响网络的结果.一种可能的解决方案是在保存网络之前删除所有丢失(和其他受影响的部分).这个解决方案对我不起作用(可能错过了一些东西).因此,作为临时解决方法,我向模型添加了 2 个签名

I had the very same problem with my Pix2Pix generator. The problem was with the training parameter. As explained here What does `training=True` mean when calling a TensorFlow Keras model? this parameter affects the results of the network. One possible solution is to remove all dropouts (and other affected parts) prior to saving the network. This solution did not work for me (probably missed something). So instead as a temporary workaround, I added 2 signatures to the model

@tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict1(input_batch):
  return {'outputs': generator(input_batch, training=True)}

@tf.function(input_signature=[tf.TensorSpec([None, 256,256,3], dtype=tf.float32)])
def model_predict2(input_batch):
  return {'outputs': generator(input_batch, training=False)}
...
generator.save(base_path + "kerassave",signatures={'predict1': model_predict1, 'predict2': model_predict2})

predict2 仍然总是返回 nans.然而,predict1 奏效了.

predict2 still always returned nans. predict1 worked, however.

这篇关于tensorflow 服务在预测时返回 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆