从简单的 Java 程序调用 mapreduce 作业 [英] Calling a mapreduce job from a simple java program

查看:19
本文介绍了从简单的 Java 程序调用 mapreduce 作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图从同一个包中的一个简单的 java 程序调用 mapreduce 作业.. 我试图在我的 java 程序中引用 mapreduce jar 文件并使用 runJar(String args[]) 调用它 方法还通过传递 mapreduce 作业的输入和输出路径.. 但程序 dint 工作..

I have been trying to call a mapreduce job from a simple java program in the same package.. I tried to refer the mapreduce jar file in my java program and call it using the runJar(String args[]) method by also passing the input and output paths for the mapreduce job.. But the program dint work..

如何运行这样一个程序,其中我只使用传递输入、输出和 jar 路径到其主要方法?是否可以通过它运行 mapreduce 作业(jar)?我想这样做是因为我想一个接一个地运行几个 mapreduce 作业,其中我的 java 程序 vl 通过引用它的 jar 文件来调用每个这样的作业.并参考其输出文件用于图形目的..

How do I run such a program where I just use pass input, output and jar path to its main method?? Is it possible to run a mapreduce job (jar) through it?? I want to do this because I want to run several mapreduce jobs one after another where my java program vl call each such job by referring its jar file.. If this gets possible, I might as well just use a simple servlet to do such calling and refer its output files for the graph purpose..

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

/**
 *
 * @author root
 */
import org.apache.hadoop.util.RunJar;
import java.util.*;

public class callOther {

    public static void main(String args[])throws Throwable
    {

        ArrayList arg=new ArrayList();

        String output="/root/Desktp/output";

        arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar");

        arg.add("/root/Desktop/input");
        arg.add(output);

        RunJar.main((String[])arg.toArray(new String[0]));

    }
}

推荐答案

哦,请不要用runJar来做,Java API 很好.

Oh please don't do it with runJar, the Java API is very good.

看看如何从普通代码开始工作:

See how you can start a job from normal code:

// create a configuration
Configuration conf = new Configuration();
// create a new job based on the configuration
Job job = new Job(conf);
// here you have to put your mapper class
job.setMapperClass(Mapper.class);
// here you have to put your reducer class
job.setReducerClass(Reducer.class);
// here you have to set the jar which is containing your 
// map/reduce class, so you can use the mapper class
job.setJarByClass(Mapper.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// this is setting the format of your input, can be TextInputFormat
job.setInputFormatClass(SequenceFileInputFormat.class);
// same with output
job.setOutputFormatClass(TextOutputFormat.class);
// here you can set the path of your input
SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/"));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path("files/out/processed/");
fs.delete(out, true);
// finally set the empty out path
TextOutputFormat.setOutputPath(job, out);

// this waits until the job completes and prints debug out to STDOUT or whatever
// has been configured in your log4j properties.
job.waitForCompletion(true);

如果您使用的是外部集群,则必须通过以下方式将以下信息添加到您的配置中:

If you are using an external cluster, you have to put the following infos to your configuration via:

// this should be like defined in your mapred-site.xml
conf.set("mapred.job.tracker", "jobtracker.com:50001"); 
// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

hadoop-core.jar 在您的应用程序容器类路径中时,这应该没有问题.但我认为你应该在你的网页上放置某种进度指示器,因为完成一个 hadoop 工作可能需要几分钟到几小时;)

This should be no problem when the hadoop-core.jar is in your application containers classpath. But I think you should put some kind of progress indicator to your web page, because it may take minutes to hours to complete a hadoop job ;)

对于 YARN (> Hadoop 2)

对于YARN,需要设置如下配置.

For YARN, the following configurations need to be set.

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

这篇关于从简单的 Java 程序调用 mapreduce 作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆