使用hadoop streaming运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1 [英] Running a R script using hadoop streaming Job Failing : PipeMapRed.waitOutputThreads(): subprocess failed with code 1
问题描述
我有一个在R Colsole中完美运行的R脚本,但是当我在Hadoop流中运行时,它在Map阶段出现以下错误。找到任务尝试日志
我有Hadoop Streaming命令:
/home/Bibhu/hadoop-0.20.2/bin/hadoop jar \
/home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \
-input hdfs:// localhost:54310 / user / Bibhu / BookTE1.csv \
-output outsid -mapper`pwd` / code1.sh
stderr logs
加载所需的包:class
read.table中的错误(file = file,header = header,sep = sep,quote =报价,:
输入中没有可用的行
调用:read.csv - > read.table
执行停止
java.lang.RuntimeException:PipeMapRed.waitOutputThreads():subprocess代码1
在org.apache.hadoop.streaming的org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
失败。 PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run( MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:使用processName = MAP初始化JVM度量标准,sessionId =
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask:numReduceTasks:1
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask:io.sort.mb = 100
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask:data buffer = 79691776/99614720
2013-07-03 19:32:36,835 INFO org.apache .hadoop.mapred.MapTask:record buffer = 262144/327680
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed exec [/ home / Bibhu / Downloads / SentimentAnalysis / Sid / smallFile / code1.sh]
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed:Records R / W = 0/1
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed:MRErrorThread done
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed:PipeMapRed失败!
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker:运行子
的错误java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
org.apache上的org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
。 hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
位于org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
位于org.apache.hadoop.mapred。 MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:38,631信息组织apache.hadoop.mapred.TaskRunner:运行清理任务
<你需要从中找到日志您的映射器和reducer,因为这是作业失败的地方(如 java.lang.RuntimeException:PipeMapRed.waitOutputThreads()所示:subprocess failed with code 1
) 。这表示您的R脚本崩溃了。
如果您使用Hortonworks Hadoop分布,最简单的方法就是打开您的职位历史。它应该在 http://127.0.0.1:19888/jobhistory
。应该可以使用命令行在文件系统中找到日志,但我还没有找到它的位置。
- 打开
http://127.0.0.1:19888/jobhistory
在您的网络浏览器中 - 点击作业ID
- 点击号码指示失败的作业计数 尝试 em> link
- 点击日志链接
看到一个类似于
的日志类型:日志类型:stderr
日志长度:418
Traceback最近调用最后一次):
文件/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py,第45行,位于< module>
mapper()
文件/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py,第37行,映射器
用于读者记录:
_csv.Error:字符串中的换行符
这是我的Python脚本中的错误,来自R的错误看起来有点不同。
来源: http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/
I have a R script which works perfectly fine in R Colsole ,but when I am running in Hadoop streaming it is failing with the below error in Map phase .Find the Task attempts log
The Hadoop Streaming Command I have :
/home/Bibhu/hadoop-0.20.2/bin/hadoop jar \
/home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \
-input hdfs://localhost:54310/user/Bibhu/BookTE1.csv \
-output outsid -mapper `pwd`/code1.sh
stderr logs
Loading required package: class
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: read.csv -> read.table
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
syslog logs
2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/home/Bibhu/Downloads/SentimentAnalysis/Sid/smallFile/code1.sh]
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=0/1
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:38,631 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
You need to find the logs from your mappers and reducers, since this is the place where the job is failing (as indicated by java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
). This says that your R script crashed.
If you are using the Hortonworks Hadoop distribuion, the easiest way is to open your jobhistory. It should be at http://127.0.0.1:19888/jobhistory
. It should be possible to find the log in the filesystem using the command line as well, but I haven't yet found where.
- Open
http://127.0.0.1:19888/jobhistory
in your web browser - Click on the Job ID of the failed job
- Click the number indicating the failed job count
- Click an attempt link
- Click the logs link
You should see a page which looks something like
Log Type: stderr
Log Length: 418
Traceback (most recent call last):
File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 45, in <module>
mapper()
File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 37, in mapper
for record in reader:
_csv.Error: newline inside string
This is an error from my Python script, the errors from R look a bit different.
source: http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/
这篇关于使用hadoop streaming运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!