将可执行jar发送到hadoop集群并作为“hadoop jar”运行 [英] Send executable jar to hadoop cluster and run as "hadoop jar"
问题描述
我通常使用main方法创建一个可执行的jar包,并通过命令行hadoop jar Some.jar ClassWithMain输入输出运行。
在这个主要方法中,Job并且可以配置Configuration,Configuration类有一个setter来指定映射器或reducer类,比如conf.setMapperClass(Mapper.class)。
然而,在提交作业的情况下远程的,我应该设置jar和Mapper或更多的类来使用hadoop客户端API。
job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);
我想以编程方式在客户端将jar转移到远程hadoop集群,然后像hadoop jar命令使主要方法指定映射器和简化器。
那么我该如何处理这个问题呢?
hadoop
只是一个shell脚本。最终, hadoop jar
会调用 org.apache.hadoop.util.RunJar
。什么 hadoop jar
do帮助你设置 CLASSPATH
。所以你可以直接使用它。
例如,
字符串input =...;
字符串输出=...;
org.apache.hadoop.util.RunJar.main(
new String [] {Some.jar,ClassWithMain,input,output});
但是,您需要设置 CLASSPATH
正确使用之前。获取正确的 CLASSPATH
的一个简便方法是 hadoop classpath
。输入这个命令,你将得到完整的 CLASSPATH
。
CLASSPATH
在运行你的java应用程序之前。例如, export CLASSPATH = $(hadoop classpath):$ CLASSPATH
java -jar YourJar.jar
I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"
In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).
However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.
job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);
I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.
So how can I deal with this problem?
hadoop
is only a shell script. Eventually, hadoop jar
will invoke org.apache.hadoop.util.RunJar
. What hadoop jar
do is helping you set up the CLASSPATH
. So you can use it directly.
For example,
String input = "...";
String output = "...";
org.apache.hadoop.util.RunJar.main(
new String[]{"Some.jar", "ClassWithMain", input, output});
However, you need to set the CLASSPATH
correctly before you use it. A convenient way to get the correct CLASSPATH
is hadoop classpath
. Type this command and you will get the full CLASSPATH
.
Then set up the CLASSPATH
before you run your java application. For example,
export CLASSPATH=$(hadoop classpath):$CLASSPATH
java -jar YourJar.jar
这篇关于将可执行jar发送到hadoop集群并作为“hadoop jar”运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!