在Apache Beam中调用外部API的更好方法 [英] Better approach to call external API in apache beam

查看:59
本文介绍了在Apache Beam中调用外部API的更好方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了从Apache Beam中的ParDo进行API调用,我有2种初始化 HttpClient 的方法.

I have 2 approaches to initialize the HttpClient in order to make an API call from a ParDo in Apache Beam.

方法1:

初始化 StartBundle 中的 HttpClient 对象,然后关闭 FinishBundle 中的 HttpClient .代码如下:

Initialise the HttpClient object in the StartBundle and close the HttpClient in FinishBundle. The code is as follows:

public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
        
        @StartBundle
        public void startBundle() {
            HttpClient client = HttpClient.newHttpClient();
            HttpRequest request = HttpRequest.newBuilder()
                                       .uri(URI.create(<Custom_URL>))
                                       .build();

        }
        
        @ProcessElement
        public void processElement(){
            // Use the client and do an external API call
        }

        @FinishBundle
        public void finishBundle(){
             httpClient.close();
        }
}

方法2:

具有一个单独的类,其中所有连接都使用连接池进行管理.

Have a separate Class where all the connections are managed using the connection pool.

public class ExternalConnection{

       HttpClient client = HttpClient.newHttpClient();
       HttpRequest request = HttpRequest.newBuilder()
                                       .uri(URI.create(<Custom_URL>))
                                       .build();
       
       public Response getResponse(){
             // use the client, send request and get response 
       }
       
}

public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> {
        
        @ProcessElement
        public void processElement(){
             Response response = new ExternalConnection().getResponse();
        }
}

在性能和编码设计标准方面,上述两种方法中哪一种更好?

Which one of the above 2 approaches are better in terms of performance and coding design standards?

推荐答案

这两种方法都可以正常工作; StartBundle/FinishBundle 一个包含较多的恕我直言,但缺点是如果捆绑包很小,则无法正常工作.更好的方法可能是使用DoFn的 SetUp/TearDown ,它可以跨越任意数量的包,但与DoFn的生存期相关(利用Beam SDK已经执行的DoFn实例池)

Either approach would work fine; the StartBundle/FinishBundle one is more contained IMHO but has the disadvantage of not working well if your bundles are very small. An even better approach might be to use DoFn's SetUp/TearDown which can span an arbitrary number of bundles, but is tied to the lifetime of the DoFn (leveraging the pooling of DoFn instances the Beam SDKs already do).

这篇关于在Apache Beam中调用外部API的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆