将jar添加到启动map的代码的类路径中,以减少作业 [英] Adding jars to the classpath of the code that launches map reduce job

查看:87
本文介绍了将jar添加到启动map的代码的类路径中,以减少作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从实现工具界面的应用程序中启动地图缩小作业. 该应用程序还执行其他一些操作,例如减少地图工作的前提条件.

I am trying to launch a map reduce job from an application that implements the Tool interface. The application does few other things which are like preconditions for the map reduce job.

此类使用一些第三方库,如何在使用以下命令运行jar时将这些jar添加到类路径中:hadoop jar< myjar> [args]

This class use some third party libs, How do I add those jars to the classpath while running the jar using the command: hadoop jar < myjar > [args]

从此 Cloudera的帖子我试图将HADOOP_CLASSPATH env var设置为第三方jar,但没有成功. 上面提到的第三方jar仅在启动作业的类中需要,而在Mapper/Reducer类中则不需要.因此,我不需要将它们放入分布式缓存中.

From this Cloudera's post I tried to set the HADOOP_CLASSPATH env var to the third party jar, but it did not work out. The third party jars mentioned above are only required by the class that launch the job and not by Mapper/Reducer classes. So I do not need to put them in Distributed Cache.

当我在$ HADOOP_HOME/lib下复制所需的这些第三方jar时,它可以工作,但是我需要更简洁的解决方案.

When I copy these third party jars that I need under $HADOOP_HOME/lib, it works, but I need a cleaner solultion.

谢谢大家.

注意-我知道可以将所有第三方jar放在my-map-reduce-job.jar的lib目录中,但是我没有这种自由,所以可以使用Maven创建jar,我想要这些my-map-reduce-job.jar之外的第三方罐子

Note - I know that putting all the third party jars in a lib directory in my-map-reduce-job.jar jar would work, but I do not have that liberty, the jar gets created using Maven and I want these third party jars outside of my-map-reduce-job.jar

推荐答案

以供将来参考- 在要启动地图缩减作业的客户端计算机上设置env var HADOOP_CLASSPATH是可行的方法.

For future references - setting env var HADOOP_CLASSPATH on the client machine fron where you are launching the map reduce job is the way to go.

我发现了我的错误,我以错误的方式导出了HADOOP_CLASSPATH. 罐子之间的分隔符取决于平台,对于Unix,其冒号(:)

I figured out my mistake, I was exporting the HADOOP_CLASSPATH in wrong way. The seperator between the jars is platform dependent, for Unix, its colon(:)

导出HADOOP_CLASSPATH =/path/to/my/jar1:/path/to/my/jar2 接着 hadoop jar [mainClass] [args]

export HADOOP_CLASSPATH=/path/to/my/jar1:/path/to/my/jar2 and then hadoop jar [mainClass] [args]

如果您在其他地方预定义了jar,则可能需要将jar附加到HADOOP_CLASSPATH env var. 导出HADOOP_CLASSPATH = $ HADOOP_CLASSPATH:/path/to/my/jar1:/path/to/my/jar2

You might want to append your jars to the HADOOP_CLASSPATH env var if it has been predefined elsewhere. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/my/jar1:/path/to/my/jar2

这篇关于将jar添加到启动map的代码的类路径中,以减少作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆