Hadoop:输出文件具有双输出 [英] Hadoop: Output file has double output

查看:168
本文介绍了Hadoop:输出文件具有双输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Hadoop程序,并将以下内容作为我的输入文件 input.txt

I am running a Hadoop program and have the following as my input file, input.txt:

1
2

mapper.py

import sys
for line in sys.stdin:
    print line,
print "Test"

reducer.py

reducer.py:

import sys
for line in sys.stdin:
    print line,

当我在没有Hadoop的情况下运行它时:
$ cat ./input.txt | ./mapper.py | ./reducer.py ,结果如预期:

When I run it without Hadoop: $ cat ./input.txt | ./mapper.py | ./reducer.py, the output is as expected:

1
2
Test

然而,通过流API通过Hadoop运行它(如这里),后者部分输出看起来有点加倍:

However, running it through Hadoop via the streaming API (as described here), the latter part of the output seems somewhat "doubled":

1
2
Test    
Test

此外,当我通过Hadoop运行该程序时,似乎它有1/4的机会由于这个原因失败:

Aditionally, when I run the program through Hadoop, it seems like it has a 1/4 chance of failing due to this:


错误:失败的地图任务数超过允许的限制。 FailedCount:1。

Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

我已经查看了一段时间,无法弄清楚我没有收到什么。如果有人能够帮助解决这些问题,我将不胜感激!感谢。

I've looked at this for some time and can't figure out what I'm not getting. If anyone could help with these issues, I would greatly appreciate it! Thanks.

编辑:当 input.txt 是:

1
2
3
4
5
6
7
8
9
10

输出结果为:

The output is:

1   
10  
2   
3   
4   
5   
6   
7   
8   
9   
Test    
Test


推荐答案

它给出相同的输出。我想你只是将reducer的位置指定为mapper.py。
请确保您提供了reducer.py

It gives the same output. I guess you are specifying the location of reducer to mapper.py only. Make sure you are providing correct path to reducer.py

这篇关于Hadoop:输出文件具有双输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆