Hadoop JobConf类已弃用,需要更新示例 [英] Hadoop JobConf class is deprecated , need updated example

查看:160
本文介绍了Hadoop JobConf类已弃用,需要更新示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写hadoop程序,我真的不想玩弃用的类。
在线任何地方我无法找到更新的程序


org.apache.hadoop.conf.Configuration





insted of


org。 apache.hadoop.mapred.JobConf


class。

  public static void main(String [] args)throws Exception {
JobConf conf = new JobConf(Test.class);
conf.setJobName(TESST);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args [0]));
FileOutputFormat.setOutputPath(conf,new Path(args [1]));

JobClient.runJob(conf);





这是我的main()的样子。
请任何人都可以为我提供更新的函数。

解决方案

这是一个经典的WordCount示例。你会注意到其他进口的语气可能不是必要的,阅读代码你会发现哪个是哪个。

有什么不同?我使用工具界面和GenericOptionParser来解析作业命令a.k.a:hadoop jar ....

在映射器中,您会注意到一个运行的事情。您可以摆脱这种情况,当您为Map方法提供代码时,通常会默认调用它。我把它放在那里给你的信息,你可以进一步控制映射阶段。这都是使用新的API。希望对你有帮助。任何其他问题,让我知道!

  import java.io.IOException; 
import java.util。*;

导入org.apache.commons.io.FileUtils;
导入org.apache.hadoop.conf。*;

导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.io。*;

导入org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.GenericOptionsParser;

public class Inception extends Configured implements Tool {

public static class Map扩展Mapper< LongWritable,Text,Text,IntWritable> {
private static static IntWritable one = new IntWritable(1);
私人文字=新文字();

public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
context.write(word,one);


$ b $ public void run(Context context)throws IOException,InterruptedException {
setup(context); $ context(),context.getCurrentValue(),context);
}
cleanup(context);
}
}

public static class Reduce extends Reducer< Text,IntWritable,Text,IntWritable> {
$ b $ public void reduce(Text key,Iterable< IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum = 0; (IntWritable val:values)
{
sum + = val.get();
}
context.write(key,new IntWritable(sum));


$ b public int run(String [] args)throws Exception {

Job job = Job.getInstance(new Configuration()) ;

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));

job.setJarByClass(WordCount.class);

job.submit();
返回0;


public static void main(String [] args)throws Exception {
Configuration conf = new Configuration();
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
ToolRunner.run(new WordCount(),otherArgs);
}
}


I am writing hadoop programs , and i really dont want to play with deprecated classes . Anywhere online i am not able to find programs with updated

org.apache.hadoop.conf.Configuration

class insted of

org.apache.hadoop.mapred.JobConf

class.

   public static void main(String[] args) throws Exception {
     JobConf conf = new JobConf(Test.class);
     conf.setJobName("TESST");

     conf.setOutputKeyClass(Text.class);
     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(Map.class);
     conf.setCombinerClass(Reduce.class);
     conf.setReducerClass(Reduce.class);

     conf.setInputFormat(TextInputFormat.class);
     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0]));
     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);
   }

This is how my main() looks like. Can please anyone will provide me with updated function.

解决方案

Here it's the classic WordCount example. You'll notice a tone of other imports that may not be necessary, reading the code you'll figure out which is which.

What's different? I'm using the Tool interface and the GenericOptionParser to parse the job command a.k.a : hadoop jar ....

In the mapper you'll notice a run thing. You can get rid of that, it's usually called by default when you supply the code for the Map method. I put it there to give you the info that you can further control the mapping stage. This is all using the new API. I hope you find it useful. Any other questions, let me know!

import java.io.IOException;
import java.util.*;

import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.GenericOptionsParser;

public class Inception extends Configured implements Tool{

 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }

  public void run (Context context) throws IOException, InterruptedException {
        setup(context);
        while (context.nextKeyValue()) {
              map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        cleanup(context);
  }
 }

 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }

public int run(String[] args) throws Exception {

    Job job = Job.getInstance(new Configuration());

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setJarByClass(WordCount.class);

    job.submit();
    return 0;
    }

 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    ToolRunner.run(new WordCount(), otherArgs);
 }
}

这篇关于Hadoop JobConf类已弃用,需要更新示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆