在 Apache Beam 中调用外部 API 的更好方法 [英] Better approach to call external API in apache beam

查看:23
本文介绍了在 Apache Beam 中调用外部 API 的更好方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两种方法来初始化 HttpClient,以便从 Apache Beam 中的 ParDo 进行 API 调用.

I have 2 approaches to initialize the HttpClient in order to make an API call from a ParDo in Apache Beam.

方法 1:

初始化StartBundle 中的HttpClient 对象并关闭FinishBundle 中的HttpClient.代码如下:

Initialise the HttpClient object in the StartBundle and close the HttpClient in FinishBundle. The code is as follows:

public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
        
        @StartBundle
        public void startBundle() {
            HttpClient client = HttpClient.newHttpClient();
            HttpRequest request = HttpRequest.newBuilder()
                                       .uri(URI.create(<Custom_URL>))
                                       .build();

        }
        
        @ProcessElement
        public void processElement(){
            // Use the client and do an external API call
        }

        @FinishBundle
        public void finishBundle(){
             httpClient.close();
        }
}

方法 2:

有一个单独的类,使用连接池管理所有连接.

Have a separate Class where all the connections are managed using the connection pool.

public class ExternalConnection{

       HttpClient client = HttpClient.newHttpClient();
       HttpRequest request = HttpRequest.newBuilder()
                                       .uri(URI.create(<Custom_URL>))
                                       .build();
       
       public Response getResponse(){
             // use the client, send request and get response 
       }
       
}

public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
        
        @ProcessElement
        public void processElement(){
             Response response = new ExternalConnection().getResponse();
        }
}

在性能和编码设计标准方面,上述 2 种方法中哪一种更好?

Which one of the above 2 approaches are better in terms of performance and coding design standards?

推荐答案

两种方法都可以;StartBundle/FinishBundle 一个包含更多恕我直言,但如果您的包非常小,则其缺点是无法正常工作.更好的方法可能是使用 DoFn 的 SetUp/TearDown,它可以跨越任意数量的包,但与 DoFn 的生命周期相关(利用 Beam SDK 已经做到的 DoFn 实例池).

Either approach would work fine; the StartBundle/FinishBundle one is more contained IMHO but has the disadvantage of not working well if your bundles are very small. An even better approach might be to use DoFn's SetUp/TearDown which can span an arbitrary number of bundles, but is tied to the lifetime of the DoFn (leveraging the pooling of DoFn instances the Beam SDKs already do).

这篇关于在 Apache Beam 中调用外部 API 的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆