从另一个Java程序运行Hadoop作业 [英] Running a Hadoop Job From another Java Program

查看:135
本文介绍了从另一个Java程序运行Hadoop作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,它接收mapper / reducers的源代码,动态编译mappers / reducers,并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。

最后一部分,我通过我的代码动态设置了所有必需的参数。但是,我现在面临的问题是,代码在编译时需要编译的mapper和reducer类。但在编译时,我没有这些类,它们将在运行时间后被接收(例如,通过从远程节点接收的消息)。我会很感激任何想法/建议如何通过这个问题?

下面你可以找到我的最后部分的代码与问题在job.setMapperClass(Mapper_Class .class)和job.setReducerClass(Reducer_Class.class),这些类需要类(Mapper_Class.class和Reducer_Class.class)文件在编译时出现:

  private boolean run_Hadoop_Job(String className){
try {
System.out.println(开始在Hadoop上运行代码...);
String [] argsTemp = {project_test / input,project_test / output};
//创建配置
配置conf = new Configuration();
conf.set(fs.default.name,hdfs:// localhost:54310);
conf.set(mapred.job.tracker,localhost:54311);
conf.set(mapred.jar,jar_Output_Folder + java.io.File.separator
+ className +。jar);
conf.set(mapreduce.map.class,Mapper_Reducer_Classes $ Mapper_Class.class);
conf.set(mapreduce.reduce.class,Mapper_Reducer_Classes $ Reducer_Class.class);
//根据配置创建一个新的作业
Job job = new Job(conf,用于动态和编程编译的Hadoop示例 - 运行作业);
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);

//减速器输出的键/值
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job,new Path(argsTemp [0]));
//这将删除可能的输出路径以防止作业失败
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp [1]);
fs.delete(out,true);
//最后设置空出路径
FileOutputFormat.setOutputPath(job,new Path(argsTemp [1]));

//job.submit();
System.exit(job.waitForCompletion(true)?0:1);
System.out.println(Job Finished!);
} catch(Exception e){return false; }
返回true;





修正了所以我修改了代码以指定使用conf的映射器和减速器。 set(mapreduce.map.class,my mapper.class)。现在代码编译正确,但是当它被执行时,它会抛出以下错误:

ec 24 ,2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO:Task Id:attempt_201212240511_0006_m_000001_2,Status:FAILED
java.lang.RuntimeException:java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

解决方案

如果你没有他们在编译时,然后直接在配置中设置名称,如下所示:

  conf.set(mapreduce.map.class,org。 what.ever.ClassName); 
conf.set(mapreduce.reduce.class,org.what.ever.ClassName);


I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.

For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?

Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:

    private boolean run_Hadoop_Job(String className){
try{
    System.out.println("Starting to run the code on Hadoop...");
    String[] argsTemp = { "project_test/input", "project_test/output" };
    // create a configuration
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://localhost:54310");
    conf.set("mapred.job.tracker", "localhost:54311");
    conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator 
                            + className+".jar");
    conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
    conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
    // create a new job based on the configuration
    Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
    job.setJarByClass(Platform.class);
    //job.setMapperClass(Mapper_Class.class);
    //job.setReducerClass(Reducer_Class.class);

    // key/value of your reducer output
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
    // this deletes possible output paths to prevent job failures
    FileSystem fs = FileSystem.get(conf);
    Path out = new Path(argsTemp[1]);
    fs.delete(out, true);
    // finally set the empty out path
    FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));

    //job.submit();
    System.exit(job.waitForCompletion(true) ? 0 : 1); 
    System.out.println("Job Finished!");        
} catch (Exception e) { return false; }
return true;
}

Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:

ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)

解决方案

If you don't have them at compile time, then directly set the name in the configuration like this:

conf.set("mapreduce.map.class", "org.what.ever.ClassName");
conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");

这篇关于从另一个Java程序运行Hadoop作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆