Hadoop Streaming命令失败并显示Python错误 [英] Hadoop Streaming Command Failure with Python Error

查看:1296
本文介绍了Hadoop Streaming命令失败并显示Python错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Ubuntu,Hadoop和DFS的新手,但是我已经成功地在我的本地ubuntu机器上安装了一个单节点hadoop实例,遵循Michael-Noll.com上发布的指示:



http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#copy-local-example- data-to-hdfs



http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/



我目前坚持在Hadoop上运行基本的字数统计范例。我不知道我从下载目录中运行Hadoop的事实是否有很大的不同,但我试图通过放置它们来绕过我的文件位置来寻找mapper.py和reducer.py函数在Hadooop工作目录中没有成功。我已经用尽了所有的研究,仍然无法解决这个问题(即使用文件参数等)。我真的很感谢任何帮助提前,我希望我以一种可以帮助其他刚刚开始的人的方式构建这个问题与Python + Hadoop。



我单独测试了mapper.py和reduce.py,并且在提示来自bash shell的玩具文本数据时都正常工作。



从我的Bash Shell输出:

  hduser @ chris-linux:/ home / chris / Downloads / hadoop $ bin / hadoop jar /home/chris/Downloads/hadoop/contrib/streaming/hadoop-streaming-1.0.4.jar -file mapper.py -file reducer.py -mapper mapper.py -reducer reducer.py -input / user / hduser / gutenberg / * -output / user / hduser / gutenberg-output3 
警告:$ HADOOP_HOME已弃用。

packageJobJar:[mapper.py,reducer.py,/ app / hadoop / tmp / hadoop-unjar4681300115516015516 /] [] /tmp/streamjob2215860242221125845.jar tmpDir = null
13/03 / 08 14:43:46 INFO util.NativeCodeLoader:加载native-hadoop库
13/03/08 14:43:46 WARN snappy.LoadSnappy:快速本地库未加载
13/03/08 14:43:46信息mapred.FileInputFormat:进程的总输入路径:3
13/03/08 14:43:47 INFO streaming.StreamJob:getLocalDirs():[/ app / hadoop / tmp / mapred /本地]
13/03/08 14:43:47信息streaming.StreamJob:正在运行的作业:job_201303081155_0032
13/03/08 14:43:47信息streaming.StreamJob:要杀死这份工作,运行:
13/03/08 14:43:47 INFO streaming.StreamJob:/home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker = localhost:54311 -kill job_201303081155_0032
13/03/08 14:43:47 INFO streaming.StreamJob:跟踪URL:http:// localhost:50030 / jobdetails.jsp?jobid = job_201303081155_0032
13/03/08 14:43 :48 INFO streaming.StreamJob :map 0%reduce 0%
13/03/08 14:44:12 INFO streaming.StreamJob:map 100%减少100%
13/03/08 14:44:12 INFO streaming.StreamJob :要杀死这项工作,请运行:
13/03/08 14:44:12 INFO streaming.StreamJob:/home/chris/Downloads/hadoop/libexec/../bin/hadoop job -Dmapred.job。 tracker = localhost:54311 -kill job_201303081155_0032
13/03/08 14:44:12 INFO streaming.StreamJob:跟踪URL:http:// localhost:50030 / jobdetails.jsp?jobid = job_201303081155_0032
13 / 03/08 14:44:12错误streaming.StreamJob:作业不成功。错误:JobCleanup任务失败,任务:task_201303081155_0032_m_000003
13/03/08 14:44:12 INFO streaming.StreamJob:killJob ...
Streaming Command Failed!

我的HDFS位于/ app / hadoop / tmp,我相信它也与我的/ user / hduser目录在我的hadoop实例上。



输入数据位于/ user / hduser / gutenberg / *(3个UTF纯文本文件)
输出设置为在/ user / hduser / gutenberg-output中创建

解决方案

查看以下路径中的日志(基于上面提供的信息):

  $ HADOOP_HOME $ / logs / userlogs / job_201303081155_0032 / task_201303081155_0032_m_000003 

code>

这应该为您提供有关该特定任务的一些信息。



Hadoop提供的日志非常好,只需要一些时间来查找信息:)

I'm a newcomer to Ubuntu, Hadoop and DFS but I've managed to install a single-node hadoop instance on my local ubuntu machine following the directions posted on Michael-Noll.com here:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#copy-local-example-data-to-hdfs

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

I'm currently stuck on running the basic word count example on Hadoop. I'm not sure if the fact I've been running Hadoop out of my Downloads directory makes too much of a difference, but I've atempted to tweek around my file locations for the mapper.py and reducer.py functions by placing them in the Hadooop working directory with no success. I've exhausted all of my research and still cannot solve this problem (i.e.- using -file parameters, etc.) I really appreciate any help in advance and I hope I framed this question in a way that can help others who are just beginning with Python + Hadoop.

I tested the mapper.py and reduce.py independently and both work fine when prompted with toy text data from the bash shell.

Output from my Bash Shell:

hduser@chris-linux:/home/chris/Downloads/hadoop$ bin/hadoop jar /home/chris/Downloads/hadoop/contrib/streaming/hadoop-streaming-1.0.4.jar -file mapper.py -file reducer.py -mapper mapper.py -reducer reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output3
Warning: $HADOOP_HOME is deprecated.

packageJobJar: [mapper.py, reducer.py, /app/hadoop/tmp/hadoop-unjar4681300115516015516/] [] /tmp/streamjob2215860242221125845.jar tmpDir=null
13/03/08 14:43:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/03/08 14:43:46 WARN snappy.LoadSnappy: Snappy native library not loaded
13/03/08 14:43:46 INFO mapred.FileInputFormat: Total input paths to process : 3
13/03/08 14:43:47 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
13/03/08 14:43:47 INFO streaming.StreamJob: Running job: job_201303081155_0032
13/03/08 14:43:47 INFO streaming.StreamJob: To kill this job, run:
13/03/08 14:43:47 INFO streaming.StreamJob: /home/chris/Downloads/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201303081155_0032
13/03/08 14:43:47 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201303081155_0032
13/03/08 14:43:48 INFO streaming.StreamJob:  map 0%  reduce 0%
13/03/08 14:44:12 INFO streaming.StreamJob:  map 100%  reduce 100%
13/03/08 14:44:12 INFO streaming.StreamJob: To kill this job, run:
13/03/08 14:44:12 INFO streaming.StreamJob: /home/chris/Downloads/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201303081155_0032
13/03/08 14:44:12 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201303081155_0032
13/03/08 14:44:12 ERROR streaming.StreamJob: Job not successful. Error: JobCleanup Task Failure, Task: task_201303081155_0032_m_000003
13/03/08 14:44:12 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

My HDFS is located at /app/hadoop/tmp which, I believe, is also the same as my /user/hduser directory on my hadoop instance.

Input data is located at /user/hduser/gutenberg/* (3 UTF plain text files) Output is set to be created at /user/hduser/gutenberg-output

解决方案

Have a look at the logs in the following path (based on the information supplied above):

$HADOOP_HOME$/logs/userlogs/job_201303081155_0032/task_201303081155_0032_m_000003

This should provide you with some information on that specific task.

The logs supplied by Hadoop are pretty good, it just takes some digging around to find the information :)

这篇关于Hadoop Streaming命令失败并显示Python错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆