Hadoop流-外部映射器脚本-找不到文件 [英] Hadoop Streaming - external mapper script - file not found

查看:88
本文介绍了Hadoop流-外部映射器脚本-找不到文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用Streaming在Hadoop上运行mapreduce作业.我有两个ruby脚本wcmapper.rb和wcreducer.rb.我正在尝试按以下方式运行作业:

Trying to run a mapreduce job on Hadoop using Streaming. I have two ruby scripts wcmapper.rb and wcreducer.rb. I'm attempting to run the job as follows:

hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file wcmapper.rb -mapper wcmapper.rb -file wcreducer.rb -reducer wcreducer.rb -input test.txt -output output

这会在控制台上导致以下错误消息:

This results in the following error message at the console:

13/11/26 12:54:07 INFO streaming.StreamJob:  map 0%  reduce 0%
13/11/26 12:54:36 INFO streaming.StreamJob:  map 100%  reduce 100%
13/11/26 12:54:36 INFO streaming.StreamJob: To kill this job, run:
13/11/26 12:54:36 INFO streaming.StreamJob: /home/paul/bin/hadoop-1.2.1/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201311261104_0009
13/11/26 12:54:36 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201311261104_0009
13/11/26 12:54:36 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201311261104_0009_m_000000
13/11/26 12:54:36 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

查看任何任务的失败尝试都会显示:

Looking at the failed attempts for any of the tasks shows:

java.io.IOException: Cannot run program "/var/lib/hadoop/mapred/local/taskTracker/paul/jobcache/job_201311261104_0010/attempt_201311261104_0010_m_000001_3/work/./wcmapper.rb": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)

我了解hadoop需要复制map和reducer脚本以供所有节点使用,并相信这是-file参数的目的.但是,似乎脚本没有被复制到hadoop希望找到它们的位置.控制台表示我认为它们已经打包了:

I understand that hadoop needs to copy the map and reducer scripts for use by all the nodes and believe this is the purpose of the -file arguments. However it seems the scripts are not being copied to the location where hadoop expects to find them. The console indicates they are being packaged I think:

packageJobJar: [wcmapper.rb, wcreducer.rb, /var/lib/hadoop/hadoop-unjar3547645655567272034/] [] /tmp/streamjob3978604690657430710.jar tmpDir=null

我也尝试了以下方法:

hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -files wcmapper.rb,wcreducer.rb -mapper wcmapper.rb -reducer wcreducer.rb -input test.txt -output output

但这给出了相同的错误.

but this gives the same error.

谁能告诉我问题出在哪里?

Can anyone tell me what the problem is?

或者在哪里可以更好地诊断问题?

Or where to look to better diagnose the issue?

非常感谢

保罗

推荐答案

对不起,找到了答案.

这些脚本已作为Packt"Hadoop入门指南"的一部分下载.

The scripts had been downloaded as part of the Packt "Hadoop Beginner's Guide"

他们最初将shebang设置为:

They originally had the shebang set as:

#!/usr/bin/env ruby

但这已为ruby本身生成了一个文件未找到错误.检查env的详细信息后,发现它使用PATH变量来确定红宝石的位置. ruby exe位于/usr/bin中,并且位于PATH中.但是,我将其修改为:

but this had generated a file not found error for ruby itself. Checking the details of env showed it used the PATH variable to determine the location of ruby. The ruby exe was in /usr/bin and this was in the PATH. However, I amended this to:

#!/usr/bin/ruby

并修复了未找到原始文件的错误,但在上述问题中产生了错误.

and this fixed the original file not found error but produced the error in the question above.

我终于尝试在控制台上自己运行Ruby脚本,结果如​​下:

I finally tried to run the Ruby scripts themselves, at the console, and this gave the result:

[paul@lt001 bin]$ ./wcmapper.rb 
bash: ./wcmapper.rb: /usr/bin/ruby^M: bad interpreter: No such file or directory

这似乎奇怪,因为该exe存在于所示目录中.

This seemed odd as the exe existed in the directory shown.

然后我重新创建了脚本文件(通过在控制台上键入脚本文件.这解决了问题(脚本在控制台和hadoop中都运行).我的假设是文件本身的格式(可能是^) M)有毛病.

I then recreated the script files (by typing them in at the console. This fixed the problem (with the scripts running both at the console and in hadoop). My assumption is that the format of the files themselves (possibly the ^M) was at fault.

总而言之,是解释器认为,与任务日志中列出的文件甚至是脚本文件本身相关的找不到文件"错误.

In summary it was the interpreter that the "file not found" error related to even tho' the file listed in the task log was the script file itself.

希望对某些人有帮助.

P

这篇关于Hadoop流-外部映射器脚本-找不到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆