从另一个Java程序运行Hadoop作业 [英] Running a Hadoop Job From another Java Program
问题描述
我正在编写一个程序,它接收mapper / reducers的源代码,动态编译mappers / reducers,并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。
最后一部分,我通过我的代码动态设置了所有必需的参数。但是,我现在面临的问题是,代码在编译时需要编译的mapper和reducer类。但在编译时,我没有这些类,它们将在运行时间后被接收(例如,通过从远程节点接收的消息)。我会很感激任何想法/建议如何通过这个问题?
下面你可以找到我的最后部分的代码与问题在job.setMapperClass(Mapper_Class .class)和job.setReducerClass(Reducer_Class.class),这些类需要类(Mapper_Class.class和Reducer_Class.class)文件在编译时出现:
private boolean run_Hadoop_Job(String className){
try {
System.out.println(开始在Hadoop上运行代码...);
String [] argsTemp = {project_test / input,project_test / output};
//创建配置
配置conf = new Configuration();
conf.set(fs.default.name,hdfs:// localhost:54310);
conf.set(mapred.job.tracker,localhost:54311);
conf.set(mapred.jar,jar_Output_Folder + java.io.File.separator
+ className +。jar);
conf.set(mapreduce.map.class,Mapper_Reducer_Classes $ Mapper_Class.class);
conf.set(mapreduce.reduce.class,Mapper_Reducer_Classes $ Reducer_Class.class);
//根据配置创建一个新的作业
Job job = new Job(conf,用于动态和编程编译的Hadoop示例 - 运行作业);
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);
//减速器输出的键/值
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,new Path(argsTemp [0]));
//这将删除可能的输出路径以防止作业失败
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp [1]);
fs.delete(out,true);
//最后设置空出路径
FileOutputFormat.setOutputPath(job,new Path(argsTemp [1]));
//job.submit();
System.exit(job.waitForCompletion(true)?0:1);
System.out.println(Job Finished!);
} catch(Exception e){return false; }
返回true;
修正了所以我修改了代码以指定使用conf的映射器和减速器。 set(mapreduce.map.class,my mapper.class)。现在代码编译正确,但是当它被执行时,它会抛出以下错误:
ec 24 ,2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO:Task Id:attempt_201212240511_0006_m_000001_2,Status:FAILED
java.lang.RuntimeException:java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
解决方案如果你没有他们在编译时,然后直接在配置中设置名称,如下所示:
conf.set(mapreduce.map.class,org。 what.ever.ClassName);
conf.set(mapreduce.reduce.class,org.what.ever.ClassName);
I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.
For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?
Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:
private boolean run_Hadoop_Job(String className){
try{
System.out.println("Starting to run the code on Hadoop...");
String[] argsTemp = { "project_test/input", "project_test/output" };
// create a configuration
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:54310");
conf.set("mapred.job.tracker", "localhost:54311");
conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator
+ className+".jar");
conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
// create a new job based on the configuration
Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp[1]);
fs.delete(out, true);
// finally set the empty out path
FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));
//job.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job Finished!");
} catch (Exception e) { return false; }
return true;
}
Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:
ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
解决方案 If you don't have them at compile time, then directly set the name in the configuration like this:
conf.set("mapreduce.map.class", "org.what.ever.ClassName");
conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");
这篇关于从另一个Java程序运行Hadoop作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!