在Hadoop Map中共享数据减少链接 [英] Data sharing in Hadoop Map Reduce chaining

查看:93
本文介绍了在Hadoop Map中共享数据减少链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在连续缩减器和映射器之间共享一个值?或者可以将第一个reducer的输出存储到内存中,第二个mapper可以访问从内存?



问题是,
我已经编写了一个像Map1 - > Reducer1 - > Map2 - > Reducer2的链式缩减器。



Map1和Map2读取相同的输入文件。

Reduce1派生一个值,假设'X'为输出。 / p>

我需要'X'和Map2的输入文件。



如何在不读取输出文件的情况下做到这一点的Reduce1?



是否可以在内存中存储'X'以访问Mapper 2?

解决方案

每项工作都是相互独立的,所以如果不将输出存储在中间位置,就不可能在作业间共享数据。
$ b FYI在MapReduce模型中,地图任务不会相互交谈。减少任务也是如此。在Hadoop上运行的 Apache Giraph 使用同一作业中的映射器之间的通信来实现迭代算法,这需要相同的作业被一次又一次运行,没有mappers之间的通信。



不确定正在执行的算法以及为什么MR,但是每个MR算法都可以在 BSP 也是如此。这是一篇比较BSP和MR的。与MR相比,一些算法在BSP中表现良好。 是BSP模型的实现,它是Apache Hadoop实现MR的方式。


Is it possible to share a value between successive reducer and mapper?

Or is it possible to store the output of first reducer into memory and second mapper can access that from memory ?

Problem is , I had written a chain map reducer like Map1 -> Reducer1 --> Map2 --> Reducer2.

Map1 and Map2 is reading the same input file.

Reduce1 is deriving a value suppose 'X' as its output.

I need 'X' and input file for Map2.

How can we do this without reading the output file of Reduce1?

Is it possible store 'X' in memory to access for Mapper 2 ?

解决方案

Each job is independent of each other, so without storing the output in intermediate location it's not possible to share the data across jobs.

FYI, in MapReduce model the map tasks don't talk to each other. Same is the case for reduce tasks also. Apache Giraph which runs on Hadoop uses communication between the mappers in the same job for iterative algorithms which requires the same job to be run again and again without communication between the mappers.

Not sure about the algorithm being implemented and why MR, but every MR algorithm can be implemented in BSP also. Here is a paper comparing BSP with MR. Some of the algorithms perform well in BSP when compared to MR. Apache Hama is an implementation of the BSP model, the way Apache Hadoop is an implementation of MR.

这篇关于在Hadoop Map中共享数据减少链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆