是否在驱动程序上执行了foreachRDD? [英] Is foreachRDD executed on the Driver?

查看:294
本文介绍了是否在驱动程序上执行了foreachRDD?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Spark Streaming处理在JMS队列(QPID)上接收到的一些XML数据.在将xml作为DStream获得后,我将它们转换为数据帧,以便可以将它们与一些我的静态数据以已经加载的数据帧的形式连接起来. 但是根据DStream上foreachRdd方法的API文档: 它可以在Driver上执行,所以这意味着所有处理逻辑都只能在Driver上运行,而不会分发给工作人员/执行者.

I am trying to process some XML data received on a JMS queue (QPID) using Spark Streaming. After getting xml as DStream I convert them to Dataframes so I can join them with some of my static data in form of Dataframes already loaded. But as per API documentation for foreachRdd method on DStream: it gets executed on Driver, so does that mean all processing logic will only run on Driver and not get distributed to workers/executors.

API文档

foreachRDD(func)

最通用的输出运算符 函数,功能,从流中生成的每个RDD.该功能 应该将每个RDD中的数据推送到外部系统,例如保存 RDD到文件,或通过网络将其写入数据库.笔记 在运行驱动程序的驱动程序中执行了func函数 流应用程序,并且通常会在其中包含RDD操作 将强制计算流式RDD.

The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

推荐答案

这是否意味着所有处理逻辑将仅在Driver上运行,而不会在Driver上运行 分发给工人/执行者.

so does that mean all processing logic will only run on Driver and not get distributed to workers/executors.

否,函数本身在驱动程序上运行,但不要忘记它在RDD上运行.您将在RDD上使用的内部函数,例如foreachPartitionmapfilter等,将仍在工作程序节点上运行. 不会会导致所有数据通过网络发送回驱动程序,除非您调用collect之类的方法.

No, the function itself runs on the driver, but don't forget that it operates on an RDD. The inner functions that you'll use on the RDD, such as foreachPartition, map, filter etc will still run on the worker nodes. This won't cause all the data to be sent back over the network to the driver, unless you call methods like collect, which do.

这篇关于是否在驱动程序上执行了foreachRDD?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆