使用hadoop streaming运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1 [英] Running a R script using hadoop streaming Job Failing : PipeMapRed.waitOutputThreads(): subprocess failed with code 1

查看:188
本文介绍了使用hadoop streaming运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在R Colsole中完美运行的R脚本,但是当我在Hadoop流中运行时,它在Map阶段出现以下错误。找到任务尝试日志



我有Hadoop Streaming命令:

  /home/Bibhu/hadoop-0.20.2/bin/hadoop jar \ 
/home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \
-input hdfs:// localhost:54310 / user / Bibhu / BookTE1.csv \
-output outsid -mapper`pwd` / code1.sh

stderr logs

 加载所需的包:class 
read.table中的错误(file = file,header = header,sep = sep,quote =报价,:
输入中没有可用的行
调用:read.csv - > read.table
执行停止
java.lang.RuntimeException:PipeMapRed.waitOutputThreads():subprocess代码1
在org.apache.hadoop.streaming的org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
失败。 PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run( MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog日志

  2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:使用processName = MAP初始化JVM度量标准,sessionId = 
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask:numReduceTasks:1
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask:io.sort.mb = 100
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask:data buffer = 79691776/99614720
2013-07-03 19:32:36,835 INFO org.apache .hadoop.mapred.MapTask:record buffer = 262144/327680
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed exec [/ home / Bibhu / Downloads / SentimentAnalysis / Sid / smallFile / code1.sh]
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed:Records R / W = 0/1
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed:MRErrorThread done
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed失败!
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker:运行子
的错误java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
org.apache上的org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
。 hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
位于org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
位于org.apache.hadoop.mapred。 MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:38,631信息组织apache.hadoop.mapred.TaskRunner:运行清理任务


解决方案

<你需要从中找到日志您的映射器和reducer,因为这是作业失败的地方(如 java.lang.RuntimeException:PipeMapRed.waitOutputThreads()所示:subprocess failed with code 1 ) 。这表示您的R脚本崩溃了。



如果您使用Hortonworks Hadoop分布,最简单的方法就是打开您的职位历史。它应该在 http://127.0.0.1:19888/jobhistory 。应该可以使用命令行在文件系统中找到日志,但我还没有找到它的位置。


  1. 打开 http://127.0.0.1:19888/jobhistory 在您的网络浏览器中

  2. 点击作业ID
  3. 点击号码指示失败的作业计数 尝试
  4. em> link
  5. 点击日志链接

看到一个类似于

 的日志类型:日志类型:stderr 
日志长度:418
Traceback最近调用最后一次):
文件/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py,第45行,位于< module>
mapper()
文件/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py,第37行,映射器
用于读者记录:
_csv.Error:字符串中的换行符

这是我的Python脚本中的错误,来自R的错误看起来有点不同。

来源: http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/


I have a R script which works perfectly fine in R Colsole ,but when I am running in Hadoop streaming it is failing with the below error in Map phase .Find the Task attempts log

The Hadoop Streaming Command I have :

/home/Bibhu/hadoop-0.20.2/bin/hadoop jar \
   /home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \
   -input hdfs://localhost:54310/user/Bibhu/BookTE1.csv \
   -output outsid -mapper `pwd`/code1.sh

stderr logs

Loading required package: class
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  no lines available in input
Calls: read.csv -> read.table
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

syslog logs

2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/home/Bibhu/Downloads/SentimentAnalysis/Sid/smallFile/code1.sh]
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=0/1
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:38,631 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

解决方案

You need to find the logs from your mappers and reducers, since this is the place where the job is failing (as indicated by java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1). This says that your R script crashed.

If you are using the Hortonworks Hadoop distribuion, the easiest way is to open your jobhistory. It should be at http://127.0.0.1:19888/jobhistory . It should be possible to find the log in the filesystem using the command line as well, but I haven't yet found where.

  1. Open http://127.0.0.1:19888/jobhistory in your web browser
  2. Click on the Job ID of the failed job
  3. Click the number indicating the failed job count
  4. Click an attempt link
  5. Click the logs link

You should see a page which looks something like

Log Type: stderr
Log Length: 418
Traceback (most recent call last):
  File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 45, in <module>
    mapper()
  File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 37, in mapper
    for record in reader:
_csv.Error: newline inside string

This is an error from my Python script, the errors from R look a bit different.

source: http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/

这篇关于使用hadoop streaming运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆