如何覆盖火花图功能中的设置和清除方法 [英] How to override setup and cleanup methods in spark map function
问题描述
假设有以下地图缩小作业
Suppose there is following map reduce job
映射器:
setup()初始化一些状态
setup() initializes some state
map()将数据添加到状态,无输出
map() add data to state, no output
cleanup()将状态输出到上下文
cleanup() ouput state to context
减速器:
将所有状态汇总为一个输出
aggregare all states into one output
如何在火花中实施这种工作?
How such job could be implemented in spark?
另一个问题:如何在烫伤中实施这种工作? 我正在寻找以某种方式使方法重载的示例...
Additional question: how such job could be implemented in scalding? I'm looking for example wich somehow makes the method overloadings...
推荐答案
Spark map
不提供与Hadoop setup
和cleanup
等效的功能.它假定每个调用都是独立的且无副作用.
Spark map
doesn't provide an equivalent of Hadoop setup
and cleanup
. It assumes that each call is independent and side effect free.
您可以得到的最接近的等效结果是使用简化的模板将所需的逻辑放入mapPartitions
或mapPartitionsWithIndex
内:
The closest equivalent you can get is to put required logic inside mapPartitions
or mapPartitionsWithIndex
with simplified template:
rdd.mapPartitions { iter => {
... // initalize state
val result = ??? // compute result for iter
... // perform cleanup
... // return results as an Iterator[U]
}}
这篇关于如何覆盖火花图功能中的设置和清除方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!