从简单的Java程序调用mapreduce作业 [英] Calling a mapreduce job from a simple java program

查看：97 发布时间：2018/5/31 18:23:00 java hadoop mapreduce

本文介绍了从简单的Java程序调用mapreduce作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直试图从一个简单的java程序中调用一个mapreduce作业，我尝试在我的java程序中引用mapreduce jar文件，并使用 runJar（String args []）方法，通过传递mapreduce作业的输入和输出路径。但程序dint工作..

我如何运行这样的程序，我只是使用pass输入，输出和jar路径到它的主要方法？是否有可能通过它运行mapreduce作业（jar）？我想这样做是因为我想要一个接一个地运行几个mapreduce作业，其中我的java程序vl通过引用它的jar文件来调用每个这样的作业。如果这有可能，那么我可能只需使用一个简单的servlet来执行此类调用并将其输出文件用于图表目的。

  / * 
 *要更改此模板，请选择工具|模板
 *并在编辑器中打开模板。 
 * / 
 
 / ** 
 * 
 * @author root 
 * / 
 import org.apache.hadoop.util.RunJar ; 
 import java.util。*; 
 
 public class callOther {
 
 public static void main（String args []）throws Throwable 
 {
 
 ArrayList arg = new ArrayList （）; 
 
字符串输出=/ root / Desktp / output; 
 
 arg.add（/ root / NetBeansProjects / wordTool / dist / wordTool.jar）; 
 
 arg.add（/ root / Desktop / input）; 
 arg.add（输出）; 
 
 RunJar.main（（String []）arg.toArray（new String [0]））; 
 
 
 $ b

解决方案

哦，请不要使用 runJar 来实现，Java API非常好。

看看如何您可以从普通代码开始工作：

  //创建配置
配置conf = new Configuration（）; 
 //根据配置创建一个新的作业
 Job job = new Job（conf）; 
 //在这里你必须把你的映射类
 job.setMapperClass（Mapper.class）; 
 //在这里你必须把你的reducer class 
 job.setReducerClass（Reducer.class）; 
 //这里你必须设置包含你的
 // map / reduce类的jar，所以你可以使用mapper类
 job.setJarByClass（Mapper.class）; 
 //减速器输出的键/值
 job.setOutputKeyClass（Text.class）; 
 job.setOutputValueClass（Text.class）; 
 //这是设置输入的格式，可以是TextInputFormat 
 job.setInputFormatClass（SequenceFileInputFormat.class）; 
 //与输出相同
 job.setOutputFormatClass（TextOutputFormat.class）; 
 //在这里你可以设置你输入的路径
 SequenceFileInputFormat.addInputPath（job，new Path（files / toMap /））; 
 //这将删除可能的输出路径以防止作业失败
 FileSystem fs = FileSystem.get（conf）; 
 Path out = new Path（files / out / processed /）; 
 fs.delete（out，true）; 
 //最后设置空出路径
 TextOutputFormat.setOutputPath（job，out）; 
 
 //这将一直等到作业完成并将调试输出到STDOUT或任何
 //已在您的log4j属性中配置。 
 job.waitForCompletion（true）;

如果您正在使用外部群集，则必须通过以下命令将以下信息添加到您的配置中：

  //这应该像在您的mapred-site.xml中定义的
 conf.set（mapred.job。跟踪器，jobtracker.com:50001）; 
 //像在hdfs-site.xml中定义的那样
 conf.set（fs.default.name，hdfs：//namenode.com：9000）;

当 hadoop-core.jar 在您的应用程序容器类路径中。
但是我认为你应该在你的网页上放一些进度指示器，因为完成一个hadoop工作可能需要几分钟到几小时;）

<对于YARN（> Hadoop 2）

对于YARN，需要设置以下配置。
//这应该像在yarn-site.xml中定义的那样 conf.set（yarn.resourcemanager.address，yarn-manager.com： 50001\" ）; //框架现在是纱线，应该像这样在mapred-site.xm中定义 conf.set（mapreduce.framework.name，yarn）; //在hdfs-site.xml中定义 conf.set（fs.default.name，hdfs：//namenode.com：9000）;

I have been trying to call a mapreduce job from a simple java program in the same package.. I tried to refer the mapreduce jar file in my java program and call it using the runJar(String args[]) method by also passing the input and output paths for the mapreduce job.. But the program dint work..

How do I run such a program where I just use pass input, output and jar path to its main method?? Is it possible to run a mapreduce job (jar) through it?? I want to do this because I want to run several mapreduce jobs one after another where my java program vl call each such job by referring its jar file.. If this gets possible, I might as well just use a simple servlet to do such calling and refer its output files for the graph purpose..

/* * To change this template, choose Tools | Templates * and open the template in the editor. */ /** * * @author root */ import org.apache.hadoop.util.RunJar; import java.util.*; public class callOther { public static void main(String args[])throws Throwable { ArrayList arg=new ArrayList(); String output="/root/Desktp/output"; arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar"); arg.add("/root/Desktop/input"); arg.add(output); RunJar.main((String[])arg.toArray(new String[0])); } }

解决方案
Oh please don't do it with runJar, the Java API is very good.

See how you can start a job from normal code:
// create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out); // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true);
If you are using an external cluster, you have to put the following infos to your configuration via:
// this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");
This should be no problem when the hadoop-core.jar is in your application containers classpath. But I think you should put some kind of progress indicator to your web page, because it may take minutes to hours to complete a hadoop job ;)

For YARN (> Hadoop 2)

For YARN, the following configurations need to be set.
// this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

这篇关于从简单的Java程序调用mapreduce作业的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从简单的Java程序调用mapreduce作业 [英] Calling a mapreduce job from a simple java program

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从简单的Java程序调用mapreduce作业 [英] Calling a mapreduce job from a simple java program

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭