多种编写Hadoop程序驱动程序的方法 - 选择哪一种? [英] Multiple ways to write driver of Hadoop program - Which one to choose?

查看:253
本文介绍了多种编写Hadoop程序驱动程序的方法 - 选择哪一种?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



以下方法在 Yahoo的Hadoop教程

  public void run(String inputPath,String outputPath)throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName(wordcount);

//键是单词(字符串)
conf.setOutputKeyClass(Text.class);
//这些值是计数(整数)
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);

FileInputFormat.addInputPath(conf,new Path(inputPath));
FileOutputFormat.setOutputPath(conf,new Path(outputPath));

JobClient.runJob(conf);
}

此方法在 Hadoop权威指南中给出2012 Oreilly书。

  public static void main(String [] args)throws Exception { b $ b if(args.length!= 2){
System.err.println(Usage:MaxTemperature< input path>< output path>);
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName(Max temperature);
FileInputFormat.addInputPath(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);

在尝试Oreilly书中给出的程序时,我发现 Job 类已被弃用。由于Oreilly的书基于Hadoop 2(纱线),我很惊讶地发现他们已经使用了已弃用的类。



我想知道每个人使用哪种方法?不知道我们是否可以通过main()方法使用它们。


I have observed that there are multiple ways to write driver method of Hadoop program.

Following method is given in Hadoop Tutorial by Yahoo

 public void run(String inputPath, String outputPath) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(MapClass.class);
    conf.setReducerClass(Reduce.class);

    FileInputFormat.addInputPath(conf, new Path(inputPath));
    FileOutputFormat.setOutputPath(conf, new Path(outputPath));

    JobClient.runJob(conf);
  }

and this method is given in Hadoop The Definitive Guide 2012 book by Oreilly.

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
    System.err.println("Usage: MaxTemperature <input path> <output path>");
    System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName("Max temperature");
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

While trying program given in Oreilly book I found that constructors of Job class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.

I would like to know which method everyone uses?

解决方案

I use the former approach.If we go with overriding the run() method, we can use hadoop jar options like -D,-libjars,-files etc.,.All these are very much necessary in almost any hadoop project. Not sure if we can use them through the main() method.

这篇关于多种编写Hadoop程序驱动程序的方法 - 选择哪一种?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆