Apache Spark和远程方法调用 [英] Apache Spark and Remote Method Invocation

查看:98
本文介绍了Apache Spark和远程方法调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解Apache Spark在幕后的工作方式.在Spark中进行一些编码后,我非常确定它可以将RDD实现为RMI 远程对象,不是吗?

I am trying to understand how Apache Spark works behind the scenes. After coding a little in Spark I am pretty quite sure that it implements the RDD as RMI Remote objects, doesn't it?

这样,它可以在转换中修改它们,例如map s,flatMap s等.不属于RDD的对象将被简单地序列化并在执行期间发送给工作程序.

In this way, it can modify them inside transformation, such as maps, flatMaps, and so on. Object that are not part of an RDD are simply serialized and sent to a worker during execution.

在下面的示例中,linestokens将被视为远程对象,而字符串toFind将被简单地序列化并复制到worker中.

In the example below, lines and tokenswill be treated as remote objects, while the string toFind will be simply serialized and copied to the workers.

val lines: RDD[String] = sc.textFile("large_file.txt")
val toFind = "Some cool string"
val tokens = 
  lines.flatMap(_ split " ")
       .filter(_.contains(toFind))

我错了吗?我在Google上做了一些搜索,但没有找到内部如何实现Spark RDD的任何参考.

Am I wrong? I googled a little but I've not found any reference to how Spark RDD are internally implemented.

推荐答案

您是正确的. Spark序列化闭包以执行远程方法调用.

You are correct. Spark serializes closures to perform remote method invocation.

这篇关于Apache Spark和远程方法调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆