apache spark中的批处理API调用? [英] Batched API call inside apache spark?

查看：58 发布时间：2021/4/8 19:25:16 apache-spark

本文介绍了apache spark中的批处理API调用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Apache Spark的初学者，确实有以下任务:

I am a beginner to Apache Spark and I do have the following task:

我正在从数据源读取记录，在Spark转换中，需要通过调用外部Web服务来增强数据，然后才能对其进行进一步处理.

I am reading records from a datasource that - within the spark transformations - need to be enhanced by data from a call to an external webservice before they can be processed any further.

Web服务将在一定程度上接受并行调用，但一次只允许发送几百条记录.而且，它非常慢，因此尽可能多地分批处理和并行请求绝对有助于解决问题.

The webservice will accept parallel calls to a certain extent, but only allows a few hundred records to be sent at once. Also, it's quite slow, so batching up as much as possible and parallel requests are definitely helping here.

有没有办法以合理的方式用spark做到这一点?

Is there are way to do this with spark in a reasonable manner?

我想到了读取记录，将它们预处理到另一个数据源，然后一次读取"API-Queue"数据源500条记录(如果可能的话，可以使用多个进程)，然后将记录写入下一个数据源，然后使用此结果数据源进行最终转换.

I thought of reading records, pre-process them to another datasource, then read the "API-Queue" data source 500 records at a time (if possible with multiple processes) and write the records to the next datasource, and use this result datasource to do the final transformations.

唯一需要遵守这些怪异限制的地方是在API调用内(这就是为什么我认为某些中间数据格式/数据源将是合适的).

The only place where those weird limits need to be respected is within the API calls (that's why I thought some intermediate data format / data source would be appropriate).

您想指出我的任何想法或方向吗?

Any ideas or directions you want to point me to?

apache spark中的批处理API调用? [英] Batched API call inside apache spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

apache spark中的批处理API调用? [英] Batched API call inside apache spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭