如何覆盖火花图功能中的设置和清除方法 [英] How to override setup and cleanup methods in spark map function

查看:117
本文介绍了如何覆盖火花图功能中的设置和清除方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有以下地图缩小作业

Suppose there is following map reduce job

映射器:

setup()初始化一些状态

setup() initializes some state

map()将数据添加到状态,无输出

map() add data to state, no output

cleanup()将状态输出到上下文

cleanup() ouput state to context

减速器:

将所有状态汇总为一个输出

aggregare all states into one output

如何在火花中实施这种工作?

How such job could be implemented in spark?

另一个问题:如何在烫伤中实施这种工作? 我正在寻找以某种方式使方法重载的示例...

Additional question: how such job could be implemented in scalding? I'm looking for example wich somehow makes the method overloadings...

推荐答案

Spark map不提供与Hadoop setupcleanup等效的功能.它假定每个调用都是独立的且无副作用.

Spark map doesn't provide an equivalent of Hadoop setup and cleanup. It assumes that each call is independent and side effect free.

您可以得到的最接近的等效结果是使用简化的模板将所需的逻辑放入mapPartitionsmapPartitionsWithIndex内:

The closest equivalent you can get is to put required logic inside mapPartitions or mapPartitionsWithIndex with simplified template:

rdd.mapPartitions { iter => {
   ... // initalize state
   val result = ??? // compute result for iter
   ... // perform cleanup
   ... // return results as an Iterator[U]
}}

这篇关于如何覆盖火花图功能中的设置和清除方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆