使用hadoop streaming和mrjob运行作业:PipeMapRed.waitOutputThreads():子进程因代码1失败 [英] Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

查看:638
本文介绍了使用hadoop streaming和mrjob运行作业:PipeMapRed.waitOutputThreads():子进程因代码1失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对大数据世界相当陌生。
我在
上遇到了本教程 http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/



它详细描述了如何在本地和Elastic Map Reduce上使用mrjob运行MapReduce作业。



嗯,我正在尝试在我自己的Hadoop cluser上运行这个。我使用以下命令运行作业。

  python density.py tiny.dat -r hadoop --hadoop-bin / usr / bin / hadoop> outputmusic 

这就是我得到的结果:

  HADOOP:正在运行的作业:job_1369345811890_0245 
HADOOP:作业job_1369345811890_0245以超级模式运行:false
HADOOP:map 0%reduce 0%
HADOOP:任务Id:attempt_1369345811890_0245_m_000000_0,状态:FAILED
HADOOP:错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads (PipeMapRed.java:320)
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:at org.apache.hadoop.streaming.PipeMapper.close (PipeMapper.java:130)
HADOOP:在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:在org.apache.hadoop.streaming.PipeMapRunner.run (PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:在org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:at org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:157)
HADOOP :在java.security.AccessController.doPrivileged(本地方法)
HADOOP:位于javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:位于org.apache.hadoop.security .UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:任务标识:try_1369345811890_0245_m_000001_0,状态:FAILED
HADOOP:错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads( PipeMapRed.java:320)
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:at org.apache.hadoop.streaming.PipeMapper.close( PipeMapper.java:130)
HADOO P:在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:在org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:在org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:157)
HADOOP:在java.security.AccessController.doPrivileged(本地方法)
HADOOP:在javax.security .auth.Subject.doAs(Subject.java:415)
HADOOP:at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:at org.apache.hadoop .mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:任务ID:attempt_1369345811890_0245_m_000000_1,状态:FAILED
HADOOP:错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads ():子进程失败,代码为1
HADOOP:在org.apa che.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:at org。 apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:at org。 apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:at org。 apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:at org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:157)
HADOOP:at java .security.AccessController.doPrivileged(Native方法)
HADOOP:位于javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:位于org.apache.hadoop.security.UserGroupInformation。 doAs(UserGroupInformation.java:1408)
HADOOP:在org.apache.hadoop.mapr ed.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:由ApplicationMaster杀死的容器。
HADOOP:
HADOOP:
HADOOP:任务ID:attempt_1369345811890_0245_m_000001_1,状态:FAILED
HADOOP:错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码1
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533 )
HADOOP:在org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61 )
HADOOP:在org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428 )
HADOOP:在org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java: 157)
HADOOP:在java.security.AccessController.doPrivileged(本地方法)
HADOOP:位于javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408 )
HADOOP:在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:任务ID:attempt_1369345811890_0245_m_000000_2,状态:FAILED
HADOOP:Error:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():subprocess failed with code 1
HADOOP:at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:在org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:在org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.M apTask.runOldMapper(MapTask.java:428)
HADOOP:在org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:在org.apache.hadoop.mapred。 YarnChild $ 2.run(YarnChild.java:157)
HADOOP:在java.security.AccessController.doPrivileged(本地方法)
HADOOP:位于javax.security.auth.Subject.doAs(Subject.java: 415)
HADOOP:在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java: 152)
HADOOP:
HADOOP:任务ID:attempt_1369345811890_0245_m_000001_2,状态:FAILED
HADOOP:错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
HADOOP:在org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:在org.apache.hadoop.str eMap.PipeMapper.close(PipeMapper.java:130)
HADOOP:位于org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:位于org.apache.hadoop。 streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:在org.apache.hadoop。 mapred.MapTask.run(MapTask.java:340)
HADOOP:at org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:157)
HADOOP:位于java.security.AccessController .doPrivileged(Native Method)
HADOOP:位于javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation。 java:1408)
HADOOP:在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:map 100%减少0%
HADOOP:作业job_1369345811890_0245由于以下原因而失败,状态为FAILED:任务失败task_1369345811890_0245_m_000001
HADOOP:任务失败时作业失败。失败地图:1失败重置:0
HADOOP:
HADOOP:计数器:6
HADOOP:作业计数器
HADOOP:失败的地图任务= 7
HADOOP:启动的地图任务= 8
HADOOP:其他本地地图任务= 6
HADOOP:数据本地地图任务= 2
HADOOP:所有地图在占用时间段内花费的总时间(ms)= 32379
HADOOP:占用插槽中所有缩减花费的总时间(毫秒)= 0
HADOOP:作业不成功!
HADOOP:流命令失败!
STDOUT:packageJobJar:[] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir = null
Traceback(大多数最近调用最后一次):
在< module>文件中的density.py,第34行,
MRDensity.run()
文件/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py,第344行,运行
mr_job.run_job()
文件/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py,行381,在run_job
runner.run()
文件/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py第316行,运行
self._run()
文件/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop。 py,第175行,在_run
self._run_job_in_hadoop()
文件/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/ hadoop.py,第325行,在_run_job_in_hadoop
中引发CalledProcessError(step_proc.returncode,streaming_args)
subprocess.CalledProcessError:命令'['/ usr / bin / hadoop','jar','/ usr /lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar','-cmdenv','PYTHONPATH = mrjob.tar.gz','-input', HDFS:///用户/ E824259 / TMP / mrjo b / density.E824259.20130611.053850.343441 / input','-output','hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output','-cacheFile','hdfs :///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py','-cacheArchive','hdfs:/// user / E824259 / tmp / mrjob / density.E824259.20130611.053850.343441 / files / mrjob.tar.gz#mrjob.tar.gz','-mapper','python density.py --step-num = 0 --mapper --protocol json --output -protocol json --input-protocol raw_value','-jobconf','mapred.reduce.tasks = 0']'返回的非零退出状态1

注意:正如我在其他论坛中所建议的那样

 # !在我的两个python文件density.py和track.py的开头部署/ usr / bin / python 

。它似乎适用于大多数人,但我仍然继续获得上述excepr。



编辑:我包含了在在density.py本身的另一个文件track.py中定义了原始density.py。这项工作顺利进行。但是如果有人知道为什么会发生这种情况,这真的很有帮助。 为Hadoop流媒体。你可以得到这个错误代码有两个主要原因:
$ b $ ul
你的Mapper和Reducer脚本不可执行(包括#!/在脚本的开始部分使用usr / bin / python)。 你的Python程序简单写错了 - 你可能有语法错误或逻辑错误。 / b>


不幸的是,错误代码1并没有提供任何细节来确切知道你的Python程序出了什么问题。 p>

我自己也遇到了错误代码1,而我发现的方法是简单地将我的Mapper脚本作为独立的Python程序运行: python mapper.py



这样做后,我得到了一个常规的Python错误,告诉我我只是给一个函数错误的类型论据。我修正了我的语法错误,并且之后所有的东西都起作用因此,如果可能的话,我会将您的Mapper或Reducer脚本作为独立的Python程序运行,以查看是否能够帮助您了解错误原因。


Hey I'm fairly new to the world of Big Data. I came across this tutorial on http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/

It describes in detail of how to run MapReduce job using mrjob both locally and on Elastic Map Reduce.

Well I'm trying to run this on my own Hadoop cluser. I ran the job using the following command.

python density.py tiny.dat -r hadoop --hadoop-bin /usr/bin/hadoop > outputmusic

And this is what I get:

HADOOP: Running job: job_1369345811890_0245
HADOOP: Job job_1369345811890_0245 running in uber mode : false
HADOOP:  map 0% reduce 0%
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Container killed by the ApplicationMaster.
HADOOP:
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:  map 100% reduce 0%
HADOOP: Job job_1369345811890_0245 failed with state FAILED due to: Task failed task_1369345811890_0245_m_000001
HADOOP: Job failed as tasks failed. failedMaps:1 failedReduces:0
HADOOP:
HADOOP: Counters: 6
HADOOP:         Job Counters
HADOOP:                 Failed map tasks=7
HADOOP:                 Launched map tasks=8
HADOOP:                 Other local map tasks=6
HADOOP:                 Data-local map tasks=2
HADOOP:                 Total time spent by all maps in occupied slots (ms)=32379
HADOOP:                 Total time spent by all reduces in occupied slots (ms)=0
HADOOP: Job not Successful!
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir=null
Traceback (most recent call last):
  File "density.py", line 34, in <module>
    MRDensity.run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 344, in run
    mr_job.run_job()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 381, in run_job
    runner.run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py", line 316, in run
    self._run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 175, in _run
    self._run_job_in_hadoop()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 325, in _run_job_in_hadoop
    raise CalledProcessError(step_proc.returncode, streaming_args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/input', '-output', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output', '-cacheFile', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py', '-cacheArchive', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/mrjob.tar.gz#mrjob.tar.gz', '-mapper', 'python density.py --step-num=0 --mapper --protocol json --output-protocol json --input-protocol raw_value', '-jobconf', 'mapred.reduce.tasks=0']' returned non-zero exit status 1

Note: As suggested in some other forums I've included

#! /usr/bin/python

at the beginning of both my python files density.py and track.py. It seems to have worked for most people but I still continue getting the above exceprions.

Edit: I included the definition of one of the functions being used in the original density.py which was definied in another file track.py in density.py itself. The job ran succesfully. But it would really be helpful if someone knows why this is happening.

解决方案

Error code 1 is a generic error for Hadoop Streaming. You can get this error code for two main reasons:

  • Your Mapper and Reducer scripts are not executable (include the #!/usr/bin/python at the beginning of the script).

  • Your Python program is simply written wrong - you could have a syntax error or logical bug.

Unfortunately, error code 1 does not give you any details to see exactly what is wrong with your Python program.

I was stuck with error code 1 for a while myself, and the way I figured it out was to simply run my Mapper script as a standalone python program: python mapper.py

After doing this, I got a regular Python error that told me I was simply giving a function the wrong type of argument. I fixed my syntax error, and everything worked after that. So if possible, I'd run your Mapper or Reducer script as a standalone Python program to see if that gives you any insight on the reasoning for your error.

这篇关于使用hadoop streaming和mrjob运行作业:PipeMapRed.waitOutputThreads():子进程因代码1失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆