从另一个Java程序运行Hadoop作业 [英] Running a Hadoop Job From another Java Program

查看：135 发布时间：2018/5/31 20:10:02 java hadoop mapreduce

本文介绍了从另一个Java程序运行Hadoop作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个程序，它接收mapper / reducers的源代码，动态编译mappers / reducers，并使JAR文件不在其中。然后它必须在hadoop集群上运行这个JAR文件。

最后一部分，我通过我的代码动态设置了所有必需的参数。但是，我现在面临的问题是，代码在编译时需要编译的mapper和reducer类。但在编译时，我没有这些类，它们将在运行时间后被接收（例如，通过从远程节点接收的消息）。我会很感激任何想法/建议如何通过这个问题？

下面你可以找到我的最后部分的代码与问题在job.setMapperClass（Mapper_Class .class）和job.setReducerClass（Reducer_Class.class），这些类需要类（Mapper_Class.class和Reducer_Class.class）文件在编译时出现：

  private boolean run_Hadoop_Job（String className）{
 try {
 System.out.println（开始在Hadoop上运行代码...）; 
 String [] argsTemp = {project_test / input，project_test / output}; 
 //创建配置
配置conf = new Configuration（）; 
 conf.set（fs.default.name，hdfs：// localhost：54310）; 
 conf.set（mapred.job.tracker，localhost：54311）; 
 conf.set（mapred.jar，jar_Output_Folder + java.io.File.separator 
 + className +。jar）; 
 conf.set（mapreduce.map.class，Mapper_Reducer_Classes $ Mapper_Class.class）; 
 conf.set（mapreduce.reduce.class，Mapper_Reducer_Classes $ Reducer_Class.class）; 
 //根据配置创建一个新的作业
 Job job = new Job（conf，用于动态和编程编译的Hadoop示例 - 运行作业）; 
 job.setJarByClass（Platform.class）; 
 //job.setMapperClass(Mapper_Class.class）; 
 //job.setReducerClass(Reducer_Class.class）; 
 
 //减速器输出的键/值
 job.setOutputKeyClass（Text.class）; 
 job.setOutputValueClass（IntWritable.class）; 
 
 FileInputFormat.addInputPath（job，new Path（argsTemp [0]））; 
 //这将删除可能的输出路径以防止作业失败
 FileSystem fs = FileSystem.get（conf）; 
 Path out = new Path（argsTemp [1]）; 
 fs.delete（out，true）; 
 //最后设置空出路径
 FileOutputFormat.setOutputPath（job，new Path（argsTemp [1]））; 
 
 //job.submit（）; 
 System.exit（job.waitForCompletion（true）？0：1）; 
 System.out.println（Job Finished！）; 
} catch（Exception e）{return false; } 
返回true; 
 
 
 
 
 
 修正了所以我修改了代码以指定使用conf的映射器和减速器。 set（mapreduce.map.class，my mapper.class）。现在代码编译正确，但是当它被执行时，它会抛出以下错误： 
 
 
 ec 24 ，2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob 
 INFO：Task Id：attempt_201212240511_0006_m_000001_2，Status：FAILED 
 java.lang.RuntimeException：java.lang.ClassNotFoundException：Mapper_Reducer_Classes $ Mapper_Class.class 
 at org.apache.hadoop.conf.Configuration.getClass（Configuration.java:809）
 at org.apache.hadoop.mapreduce.JobContext.getMapperClass（JobContext.java:157）
 at org.apache.hadoop.mapred.MapTask.runNewMapper（MapTask.java:569）
 at org.apache.hadoop.mapred.MapTask.run（MapTask.java:305）
 at org.apache.hadoop.mapred.Child.main（Child.java:170）
解决方案
如果你没有他们在编译时，然后直接在配置中设置名称，如下所示：
  conf.set（mapreduce.map.class，org。 what.ever.ClassName）; 
 conf.set（mapreduce.reduce.class，org.what.ever.ClassName）; 
  
 
I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster. 


For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?

Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:
    private boolean run_Hadoop_Job(String className){
try{
    System.out.println("Starting to run the code on Hadoop...");
    String[] argsTemp = { "project_test/input", "project_test/output" };
    // create a configuration
    Configuration conf = new Configuration();
    conf.set("fs.default.name", "hdfs://localhost:54310");
    conf.set("mapred.job.tracker", "localhost:54311");
    conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator 
                            + className+".jar");
    conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
    conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
    // create a new job based on the configuration
    Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
    job.setJarByClass(Platform.class);
    //job.setMapperClass(Mapper_Class.class);
    //job.setReducerClass(Reducer_Class.class);

    // key/value of your reducer output
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
    // this deletes possible output paths to prevent job failures
    FileSystem fs = FileSystem.get(conf);
    Path out = new Path(argsTemp[1]);
    fs.delete(out, true);
    // finally set the empty out path
    FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));

    //job.submit();
    System.exit(job.waitForCompletion(true) ? 0 : 1); 
    System.out.println("Job Finished!");        
} catch (Exception e) { return false; }
return true;
}
Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:

ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
 解决方案 
If you don't have them at compile time, then directly set the name in the configuration like this:
conf.set("mapreduce.map.class", "org.what.ever.ClassName");
conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");


                        
这篇关于从另一个Java程序运行Hadoop作业的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从另一个Java程序运行Hadoop作业 [英] Running a Hadoop Job From another Java Program

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从另一个Java程序运行Hadoop作业 [英] Running a Hadoop Job From another Java Program

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭