Google Dataflow 中的单例 [英] Singleton in Google Dataflow

查看:23
本文介绍了Google Dataflow 中的单例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从 PubSub 读取消息的数据流.我需要使用几个 API 来丰富这个消息.我想要这个 API 的一个实例来处理所有记录.这是为了避免为每个请求初始化 API.

I have a dataflow which reads the messages from PubSub. I need to enrich this message using couple of API's. I want to have a single instance of this API to used for processing all records. This is to avoid initializing API for every request.

我尝试创建一个静态变量,但我仍然看到 API 被初始化了很多次.

I tried creating a static variable, but still I see the API is initialized many times.

如何避免在 Google Dataflow 中多次初始化一个变量?

How to avoid initializing of a variable multiple times in Google Dataflow?

推荐答案

Dataflow 使用多台机器并行进行数据分析,因此您的 API 必须每台机器至少初始化一次.

Dataflow uses multiple machines in parallel to do data analysis, so your API will have to be initialized at least once per machine.

事实上,Dataflow 对这些机器的寿命没有很强的保证,所以它们可能来来去去比较频繁.

In fact, Dataflow does not have strong guarantees on the life of these machines, so they may come and go relatively frequently.

让您的作业访问外部服务并避免过多初始化 API 的一种简单方法是在您的 DoFn 中对其进行初始化:

A simple way to have your job access an external service and avoid initializing the API too much is to initialize it in your DoFn:

class APICallingDoFn extends DoFn {
    private ExternalServiceHandle handle = null;

    @Setup
    public void initializeExternalAPI() {
      // ...
    }

    @ProcessElement
    public void processElement(ProcessContext c) {
        // ... process each element -- setup will have been called
    }
}

您需要这样做,因为 Beam 和 Dataflow 都可以保证 DoFn 实例或工作器的持续时间.

You need to do this because Beam nor Dataflow guarantee the duration of a DoFn instance, or a worker.

希望这会有所帮助.

这篇关于Google Dataflow 中的单例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆