在Spark中使用期货 [英] Using Futures within Spark

查看：101 发布时间：2020/9/4 1:58:29 scala apache-spark

本文介绍了在Spark中使用期货的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Spark作业为RDD中的每个元素提供一个远程Web服务.一个简单的实现可能看起来像这样:

A Spark job makes a remote web service for every element in an RDD. A simple implementation might look something like this:

def webServiceCall(url: String) = scala.io.Source.fromURL(url).mkString
rdd2 = rdd1.map(x => webServiceCall(x.field1))

(上面的示例保持简单，不处理超时).

(The above example has been kept simple and does not handle timeouts).

对于RDD的不同元素，任何结果之间都没有相互依赖性.

There is no interdependency between any of the results for different elements of the RDD.

通过使用Future通过为RDD的每个元素对Web服务进行并行调用来优化性能，是否可以改善上述情况?还是Spark本身内置了优化级别，以便它可以并行在RDD中的每个元素上运行操作?

Would the above be improved by using Futures to optimise performance by making parallel calls to the web service for each element of the RDD? Or does Spark itself have that level of optimization built in, so that it will run the operations on each element in the RDD in parallel?

如果可以通过使用Futures来优化上述内容，是否有人可以通过一些代码示例显示在传递给Spark RDD的函数中使用Futures的正确方法.

If the above can be optimized by using Futures, does anyone have some code examples showing the correct way to use Futures within a function passed to a Spark RDD.

谢谢

在Spark中使用期货 [英] Using Futures within Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Spark中使用期货 [英] Using Futures within Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭