Google Dataflow中的单例 [英] Singleton in Google Dataflow

查看:120
本文介绍了Google Dataflow中的单例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从PubSub读取消息的数据流。我需要使用几个API来丰富这个消息。我想要使​​用此API的单个实例来处理所有记录。这是为了避免为每个请求初始化API。

我尝试创建一个静态变量,但我仍然看到API已经多次初始化。

如何避免在Google Dataflow中多次初始化变量?

解决方案

Dataflow并行使用多台机器来进行数据分析,因此您的API必须在每台机器上至少初始化一次。实际上,Dataflow对这些机器的使用寿命没有很好的保证,所以它们可能会频繁出现。



让您的工作访问外部服务并避免初始化API太简单的方法是在DoFn中初始化它:

  class APICallingDoFn extends DoFn {
private ExternalServiceHandle handle = null;

@Setup
public void initializeExternalAPI(){
// ...
}

@ProcessElement
public void processElement(ProcessContext c){
// ...处理每个元素 - 设置将被称为
}
}

您需要这样做是因为Beam或Dataflow可以确保DoFn实例或工作人员的持续时间



希望这有助于。


I have a dataflow which reads the messages from PubSub. I need to enrich this message using couple of API's. I want to have a single instance of this API to used for processing all records. This is to avoid initializing API for every request.

I tried creating a static variable, but still I see the API is initialized many times.

How to avoid initializing of a variable multiple times in Google Dataflow?

解决方案

Dataflow uses multiple machines in parallel to do data analysis, so your API will have to be initialized at least once per machine.

In fact, Dataflow does not have strong guarantees on the life of these machines, so they may come and go relatively frequently.

A simple way to have your job access an external service and avoid initializing the API too much is to initialize it in your DoFn:

class APICallingDoFn extends DoFn {
    private ExternalServiceHandle handle = null;

    @Setup
    public void initializeExternalAPI() {
      // ...
    }

    @ProcessElement
    public void processElement(ProcessContext c) {
        // ... process each element -- setup will have been called
    }
}

You need to do this because Beam nor Dataflow guarantee the duration of a DoFn instance, or a worker.

Hope this helps.

这篇关于Google Dataflow中的单例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆