多个输入:将相同的输入添加到多个映射器进行比较 [英] Multiple Inputs : Adding same input to multiple mappers for comparison

查看:159
本文介绍了多个输入:将相同的输入添加到多个映射器进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个映射器类,它们将来自同一文件夹的一些文件作为输入,并基于具有时间戳的文件的名称确定文件必须作为输入提供给哪个映射器。有时,恰好相同的输入文件将作为两个不同映射器的输入。现在,我测试了两个不同的输入给Mappers时的工作方式,但是当我给它们输入相同的输入时,其中一个Mapper类不会生成结果用于Reducer中的比较。

I have two Mapper Classes which take some files from the same folder as input and based on the name of the file which has a timestamp determines which mapper the file has to be given as an Input. At times it so happens that the same input file is to be given as an input to two different Mappers. Now I've tested it to work when two different inputs are given to both Mappers but When I give them the same input , then one of the Mapper class doesn't generate the result to be used for comparison in the reducer.

代码非常庞大,所以我不会将它放在这里,而是描述我所做的。我创建了两个列表,并通过目录中的文件进行扫描,并根据具有时间戳的文件的名称,将它们放在两个不同的列表中,然后将它们添加到两个不同的映射器中,即两者的计算方式不同,因此我使用不同的映射要计算的映射器,然后用于在缩减器中进行比较,但是当它与输入文件相同时,两个映射器的时间标准几乎相同,其中一个映射器不会生成任何结果。这是因为一个映射器无法访问该文件,因为另一个映射器正在使用它,如果情况是这样的话,那么有没有办法绕过它。

The code is enormous so instead of putting it here , I'll describe what I had done. I created two lists and scanning through the files in the directory and based on the names of the files which have timestamps , I put them in two different lists and then add them to two different Mappers i.e. both of them are computed differently so I use different Mappers to compute , which is then used to compare in the reducer, but when it is the same Input file as the time criteria for both mappers is almost same one of the mapper doesn't generate any result. So is it because one mapper is not able to access the file because the other is using it and if that is the case is there any way around it.

这里MapPath1是一个列表,而MapPath2是另一个列表

Here MapPath1 is one list while MapPath2 is another

for(i=0;i<MapPath1.size();i++)
      MultipleInputs.addInputPath(job,new Path(MapPath1.get(i)),TextInputFormat.class,Map1.class);
if(type.equals("comparative"))
      for(i=0;i<MapPath2.size();i++)
            MultipleInputs.addInputPath(job,new Path(MapPath2.get(i)),TextInputFormat.class,Map2.class); 

更新

我刚刚发现了这个问题( hadoop中的多个映射器)与我的类似,但我不想复制输入文件,因为它可能很大。任何人都可以指导我如何使用不同的映射器创建两个独立的作业,并将其提供给一个简单的Reducer。

I just Found this question ( Multiple mappers in hadoop ) to be similar to mine but I don't want to be duplicating the input file as it can be large. Can any one direct me on how can I create two separate jobs using different Mappers and provide it to a single reducer.

推荐答案


其中一个Mapper类不会生成要用于比较的结果。

one of the Mapper class doesn't generate the result to be used for comparison in the reducer.

我的猜测两个映射器都在相同的任务跟踪器节点上启动,中间映射器输出位置由两个映射器任务共享 - 您应该检查启动这些映射任务的任务跟踪器节点以确认这一点。

My guess that both the mappers are getting launched on the same task tracker node and intermediate mapper output location is shared by both the mapper task - You should check the task tracker nodes where these map tasks are launched to confirm this.

您也应该只运行mapper(s)作业,将reduce任务的数量设置为零并检查输出 - 这是为了确认映射器不共享输出目录。

Also you should run mapper(s) only job, by setting number of reduce tasks to zero and check the output - this is to confirm that mapper are not sharing output directories.

为了给你的问题提供解决方案 - 这听起来像是你将同一个文件传递给映射器和来自这两个映射器的数据。这有一些重复,您的工作输出可以有这种重复?

To give solution to your problem - it sounds like you are passing same file to both the mappers and data from both the mappers given to single reducer. This has some duplication, Is your job output ok to have this duplication?

这篇关于多个输入:将相同的输入添加到多个映射器进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆