将可执行jar发送到hadoop集群并作为“hadoop jar”运行 [英] Send executable jar to hadoop cluster and run as "hadoop jar"

查看:187
本文介绍了将可执行jar发送到hadoop集群并作为“hadoop jar”运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通常使用main方法创建一个可执行的jar包,并通过命令行hadoop jar Some.jar ClassWithMain输入输出运行。

在这个主要方法中,Job并且可以配置Configuration,Configuration类有一个setter来指定映射器或reducer类,比如conf.setMapperClass(Mapper.class)。

然而,在提交作业的情况下远程的,我应该设置jar和Mapper或更多的类来使用hadoop客户端API。

  job.setJarByClass(HasMainMethod.class); 
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);

我想以编程方式在客户端将jar转移到远程hadoop集群,然后像hadoop jar命令使主要方法指定映射器和简化器。

那么我该如何处理这个问题呢?

解决方案

hadoop 只是一个shell脚本。最终, hadoop jar 会调用 org.apache.hadoop.util.RunJar 。什么 hadoop jar do帮助你设置 CLASSPATH 。所以你可以直接使用它。



例如,

 字符串input =...; 
字符串输出=...;
org.apache.hadoop.util.RunJar.main(
new String [] {Some.jar,ClassWithMain,input,output});

但是,您需要设置 CLASSPATH 正确使用之前。获取正确的 CLASSPATH 的一个简便方法是 hadoop classpath 。输入这个命令,你将得到完整的 CLASSPATH

然后设置 CLASSPATH 在运行你的java应用程序之前。例如,

  export CLASSPATH = $(hadoop classpath):$ CLASSPATH 
java -jar YourJar.jar


I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"

In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).

However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.

job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);

I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.

So how can I deal with this problem?

解决方案

hadoop is only a shell script. Eventually, hadoop jar will invoke org.apache.hadoop.util.RunJar. What hadoop jar do is helping you set up the CLASSPATH. So you can use it directly.

For example,

String input = "...";
String output = "...";
org.apache.hadoop.util.RunJar.main(
    new String[]{"Some.jar", "ClassWithMain", input, output});

However, you need to set the CLASSPATH correctly before you use it. A convenient way to get the correct CLASSPATH is hadoop classpath. Type this command and you will get the full CLASSPATH.

Then set up the CLASSPATH before you run your java application. For example,

export CLASSPATH=$(hadoop classpath):$CLASSPATH
java -jar YourJar.jar

这篇关于将可执行jar发送到hadoop集群并作为“hadoop jar”运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆