如何与 tensorflow 保存的模型预测器并行进行推理? [英] How to do inference in parallel with tensorflow saved model predictors?

查看:88
本文介绍了如何与 tensorflow 保存的模型预测器并行进行推理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TensorFlow 版本:1.14

Tensorflow version: 1.14

我们当前的设置是使用 tensorflow estimators 进行实时 NER,即一次执行一个文档的推理.我们有 30 个不同的字段要提取,每个字段运行一个模型,所以总共有 30 个模型.

Our current setup is using tensorflow estimators to do live NER i.e. perform inference one document at a time. We have 30 different fields to extract, and we run one model per field, so got total of 30 models.

我们当前的设置使用 python 多处理并行进行推理.(推理是在 CPU 上完成的.)这种方法会在每次做出预测时重新加载模型权重.

Our current setup uses python multiprocessing to do the inferences in parallel. (The inference is done on CPUs.) This approach reloads the model weights each time a prediction is made.

使用此处提到的方法,我们将估算器模型导出为<代码>tf.saved_model.这按预期工作,因为它不会为每个请求重新加载权重.它也适用于一个进程中的单个字段推理,但不适用于多处理.当我们进行预测函数(链接帖子中的predict_fn)调用时,所有进程都挂起.

Using the approach mentioned here, we exported the estimator models as tf.saved_model. This works as expected in that it does not reload the weights for each request. It also works fine for a single field inference in one process, but doesn't work with multiprocessing. All the processes hang when we make the predict function (predict_fn in the linked post) call.

这篇文章是相关的,但不确定如何将其调整为保存的模型.

This post is related, but not sure how to adapt it for saved model.

为每个预测变量单独导入 tensorflow 也不起作用:

Importing tensorflow individually for each of the predictors did not work either:

class SavedModelPredictor():

    def __init__(self, model_path):
        import tensorflow as tf
        self.predictor_fn = tf.contrib.predictor.from_saved_model(model_path)

    def predictor_fn(self, input_dict):
        return self.predictor_fn(input_dict)

如何使 tf.saved_model 与多处理一起工作?

How to make tf.saved_model work with multiprocessing?

推荐答案

Ray Serve,Ray 的模型服务解决方案,也支持离线批处理.您可以将您的模型包装在 Ray Serve 的后端中,并将其缩放到您想要的数量副本.

Ray Serve, ray's model serving solution, also support offline batching. You can wrap your model in Ray Serve's backend and scale it to the number replica you want.

from ray import serve
client = serve.start()

class MyTFModel:
    def __init__(self, model_path):
        self.model = ... # load model

    @serve.accept_batch
    def __call__(self, input_batch):
        assert isinstance(input_batch, list)

        # forward pass
        self.model([item.data for item in input_batch])

        # return a list of response
        return [...]

client.create_backend("tf", MyTFModel, 
    # configure resources
    ray_actor_options={"num_cpus": 2, "num_gpus": 1},
    # configure replicas
    config={
        "num_replicas": 2, 
        "max_batch_size": 24,
        "batch_wait_timeout": 0.5
    }
)
client.create_endpoint("tf", backend="tf")
handle = serve.get_handle("tf")

# perform inference on a list of input
futures = [handle.remote(data) for data in fields]
result = ray.get(futures)

使用夜间轮子尝试一下,这里是教程:https://docs.ray.io/en/master/serve/tutorials/batch.html

Try it out with the nightly wheel and here's the tutorial: https://docs.ray.io/en/master/serve/tutorials/batch.html

更新了 Ray 1.0 的代码示例

updated the code sample for Ray 1.0

这篇关于如何与 tensorflow 保存的模型预测器并行进行推理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆