来自Spark执行者的节流并发HTTP请求 [英] Throttle concurrent HTTP requests from Spark executors

查看：732 发布时间：2020/6/3 18:42:16 scala apache-spark akka

本文介绍了来自Spark执行者的节流并发HTTP请求的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从Spark作业内部向限速API发出一些Http请求。为了跟踪非分布式系统（在Scala中）的并发请求数，请执行以下操作：

I want to do some Http requests from inside a Spark job to a rate limited API. In order to keep track of the number of concurrent requests in a non-distributed system (in Scala), following works:

一个节流角色，它维持一个信号量（计数器），该信号量在请求开始时增加，在请求完成时减少。尽管 Akka 是分布式的，但是在分布式Spark上下文中存在 actorSystem 的（反）序列化的问题。

在fs2中使用并行流： https://fs2.io /concurrency-primitives.html =>无法分发。

我想我也可以收集数据帧到Spark driver 并使用上述选项之一处理节流，但我想保持此状态不变。

a throttling actor which maintains a semaphore (counter) which increments when the request starts and decrements when the request completes. Although Akka is distributed, there are issues to (de)serialize the actorSystem in a distributed Spark context.
using parallel streams with fs2: https://fs2.io/concurrency-primitives.html => cannot be distributed.
I suppose I could also just collect the dataframes to the Spark driver and handle throttling there with one of above options, but I would like to keep this distributed.

通常如何处理这些事情？

How are such things typically handled?

来自Spark执行者的节流并发HTTP请求 [英] Throttle concurrent HTTP requests from Spark executors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

来自Spark执行者的节流并发HTTP请求 [英] Throttle concurrent HTTP requests from Spark executors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭