Hadoop错误:启动作业错误,错误的输入路径:文件不存在。流命令失败 [英] Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed

查看:1083
本文介绍了Hadoop错误:启动作业错误,错误的输入路径:文件不存在。流命令失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hadoop集群上运行MRJob,我收到以下错误:

 找不到配置;回到自动配置
在$ PATH中寻找hadoop二进制文件...
找到hadoop二进制文件:/ usr / local / hadoop / bin / hadoop
使用Hadoop版本2.7.3
在/ usr / local / hadoop中寻找Hadoop streaming jar ...
找到Hadoop streaming jar:/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
创建临时目录/tmp/Mr_Jobs.hduser.20170227.030012.446820
将本地文件复制到hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/ .. 。
正在运行1的步骤1 ...
session.id已弃用。相反,使用dfs.metrics.session-id
使用processName = JobTracker,sessionId = $ b $初始化JVM度量标准无法使用processName = JobTracker,sessionId = - 初始化JVM度量标准
清理分段区域文件:/app/hadoop/tmp/mapred/staging/hduser1748755362/.staging/job_local1748755362_0001
错误启动作业,错误的输入路径:文件不存在:/ app / hadoop / tmp / mapred / staging / hduser1748755362 / .staging / job_local1748755362_0001 / files / Mr_Jobs.py#Mr_Jobs.py
Streaming Command Failed!
试图从日志中获取计数器...
无法获取历史日志;缺少工作ID
没有找到计数器
扫描可能的失败原因日志...
无法获取历史记录;缺少作业ID
无法提取任务日志;缺少应用程序ID
第1步失败:命令'['/ usr / local / hadoop / bin / hadoop','jar','/ usr / local / hadoop / share / hadoop / tools / lib / hadoop -streaming-2.7.3.jar','-files','hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/Mr_Jobs.py#Mr_Jobs.py,hdfs:/ //user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/mrjob.zip#mrjob.zip,hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files /setup-wrapper.sh#setup-wrapper.sh','-input','hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/File.txt',' - 输出','hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/output','-mapper','sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step- num = 0 --mapper','-combiner','sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num = 0 --combiner','-reducer','sh -ex setup-wrapper .sh python3 Mr_Jobs.py --step-num = 0 --reducer']'返回非零出口状态s 512

我正在通过这个命令运行这个工作:



python3 /home/bhoots21304/Desktop/MrJobs-MR.py -r hadoop hdfs://input3/File.txt



另外第一行说:
找不到配置;回到自动配置状态



我在网上查询。它说应该有/ etc /文件夹中的mrjob.conf文件的名称,但它不存在于我的文件系统中的任何地方。
我需要创建这个文件吗?如果是这样,它的内容应该是什么。



我使用本文档中提到的说明安装了hadoop:

< a href =https://github.com/ev2900/Dev_Notes/blob/master/Hadoop/notes.txt =nofollow noreferrer> https://github.com/ev2900/Dev_Notes/blob/master/Hadoop /notes.txt 还有hadoop-env.sh,core-site.xml,mapred-site.xml,hdfs-site.xml也配置得很好因为它的工作如果我只是运行一个简单的worcount工作(没有MRJob的)



(安装MRJob使用'sudo -H pip3安装mrjob')

解决方案

您需要在mrjob.conf中指定python-bin和hadoop_streaming_jar。它应该看起来像这样,取决于jar的位置。

 跑步者:
hadoop:
python_bin:python3
hadoop_streaming_jar:/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar


I am running an MRJob on Hadoop cluster & I am getting the following error:

No configs found; falling back on auto-configuration
Looking for hadoop binary in $PATH...
Found hadoop binary: /usr/local/hadoop/bin/hadoop
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /usr/local/hadoop...
Found Hadoop streaming jar: /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
Creating temp directory /tmp/Mr_Jobs.hduser.20170227.030012.446820
Copying local files to hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/...
Running step 1 of 1...
  session.id is deprecated. Instead, use dfs.metrics.session-id
  Initializing JVM Metrics with processName=JobTracker, sessionId=
  Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser1748755362/.staging/job_local1748755362_0001
  Error launching job , bad input path : File does not exist: /app/hadoop/tmp/mapred/staging/hduser1748755362/.staging/job_local1748755362_0001/files/Mr_Jobs.py#Mr_Jobs.py
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/usr/local/hadoop/bin/hadoop', 'jar', '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar', '-files', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/Mr_Jobs.py#Mr_Jobs.py,hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/mrjob.zip#mrjob.zip,hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/File.txt', '-output', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/output', '-mapper', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --mapper', '-combiner', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --combiner', '-reducer', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --reducer']' returned non-zero exit status 512

I am running the job via this command :

python3 /home/bhoots21304/Desktop/MrJobs-MR.py -r hadoop hdfs://input3/File.txt

Also First line says: No configs found; falling back on auto-configuration

I looked up online. It says there should be file by the name of mrjob.conf in /etc/ folder.But it's not present anywhere in my filesystem. Do i need to create this file. If so what should be it's contents.

I installed hadoop using the instructions mentioned in this file:

https://github.com/ev2900/Dev_Notes/blob/master/Hadoop/notes.txt

Also hadoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml are configured well because its working if i just run a simple worcount job(without MRJob's)

(Installed MRJob's using 'sudo -H pip3 install mrjob')

解决方案

You need to specify the python-bin and hadoop_streaming_jar in mrjob.conf. It should look something like this, depending on the location of the jar.

runners:
    hadoop:
        python_bin: python3
        hadoop_streaming_jar: /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar

这篇关于Hadoop错误:启动作业错误,错误的输入路径:文件不存在。流命令失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆