无法在Hadoop中使用python运行map reduce? [英] unable to run map reduce using python in Hadoop?

查看:185
本文介绍了无法在Hadoop中使用python运行map reduce?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用python编写了mapper和reducer,可以很好地工作。
下面是一个示例:

I have written mapper and reducer in python for word count program that works fine. Here is a sample:

echo "hello hello world here hello here world here hello" | wordmapper.py | sort -k1,1 | wordreducer.py 
hello   4
here    3
world   2

现在,当我尝试为大文件提交hadoop作业时,我收到了错误

Now when i try to submit a hadoop job for a large file, I get errors

hadoop jar share/hadoop/tools/sources/hadoop-*streaming*.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: share.hadoop.tools.sources.hadoop-streaming-2.2.0-test-sources.jar
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

我删除了将命令行更改为以下内容从上面);

I removed changed the commandline to the following (removed wild card from above);

hadoop jar share/hadoop/tools/sources/hadoop-streaming-2.2.0-sources.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: -file
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

为何我ge t这些错误以及如何解决这个问题?
我使用 hadoop2。谢谢!

why I get these errors and how to fix this? I use hadoop2. Thanks!

推荐答案

试着用这个代替...

Try using this instead...

share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

如果那样做不存在,请查找文件名中没有 -sources hadoop-streaming * .jar

And if that doesn't exist, look for a hadoop-streaming*.jar that doesn't have -sources in the file name.

这篇关于无法在Hadoop中使用python运行map reduce?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆