远程运行Hadoop作业 [英] Running Hadoop Job Remotely

查看:163
本文介绍了远程运行Hadoop作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从集群外部运行MapReduce作业。



例如。 Hadoop集群正在Linux机器上运行。
我们有一个在Windows机器上运行的Web应用程序。
我们想从这个远程Web应用程序运行hadoop作业。
我们希望检索hadoop输出目录并将其显示为图形。



我们编写了以下代码片段:

  Configuration conf = new Configuration(); 

工作职位=新职位(conf);

conf.set(mapred.job.tracker,192.168.56.101:54311);

conf.set(fs.default.name,hdfs://192.168.56.101:54310);

job.setJarByClass(Analysis.class);
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);



//job.set

job.setInputFormatClass(CustomFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);


job.waitForCompletion(true);

这是我们得到的错误。即使我们关闭了hadoop 1.1.2集群,错误仍然是一样的。

  14/03/07 00: 23:37 WARN util.NativeCodeLoader:无法为您的平台加载native-hadoop库......在适用的情况下使用builtin-java类
14/03/07 00:23:37 ERROR security.UserGroupInformation:PriviledgedActionException as:用户原因:java.io.IOException:无法设置路径的权限:\tmp\hadoop -user\mapred\staging\user818037780\.staging为0700
线程mainjava中的异常.io.IOException:未能在org.apache.hadoop.fs.FileUtil中设置路径的权限:\ tmp \hadoop-user\mapred\staging\user818037780\.staging为0700
。 checkReturnValue(FileUtil.java:691)
在org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
在org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem。 java:514)
在org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349 )
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:942)
在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native方法)
位于javax.security.auth.Subject.doAs(Subject.java:396)
位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation .java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550 )
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at LineCounter.main(LineCounter.java:86)


解决方案

从远程系统运行时,应该以远程用户身份运行。你可以在你的主类中做如下工作:

  public static void main(String a []){
UserGroupInformation ugi
= UserGroupInformation.createRemoteUser(root);

try {


ugi.doAs(new PrivilegedExceptionAction< Void>(){

public void run()throws Exception {
Configuration conf = new Configuration();

Job job = new Job(conf);

conf.set(hadoop.job.ugi, root);

//在这里写下你的剩余代码

return null;
}
});

} catch(Exception e){
e.printStackTrace();
}

}

在提交mapreduce作业时,它应该将你的java类与它们的依赖jar包复制到hadoop集群,在那里它执行mapreduce作业。你可以阅读更多 here

所以你需要创建一个可运行的jar包你的代码(在你的情况下有主类分析)与清单类路径中的所有相关jar文件。然后使用命令行运行你的jar文件:

  java -jar job-jar -with-dependencies.jar arguments 

HTH!


I am trying to run a MapReduce job from outside the cluster.

e.g. Hadoop cluster is running on Linux machines. We have one web application running on a Windows machine. We want to run the hadoop job from this remote web application. We want to retrieve the hadoop output directory and present it as a Graph.

We have written the following piece of code:

Configuration conf = new Configuration();

Job job = new Job(conf);

conf.set("mapred.job.tracker", "192.168.56.101:54311"); 

conf.set("fs.default.name", "hdfs://192.168.56.101:54310");

job.setJarByClass(Analysis.class) ;
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);



//job.set

job.setInputFormatClass(CustomFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);


job.waitForCompletion(true);

And this is the error we get. Even if we shut down the hadoop 1.1.2 cluster, the error is still the same.

14/03/07 00:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/07 00:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:user cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at LineCounter.main(LineCounter.java:86)

解决方案

While running from a remote system, you should run as remote user. You can do it in your main class as follows:

public static void main(String a[]) {
     UserGroupInformation ugi
     = UserGroupInformation.createRemoteUser("root");

     try {


        ugi.doAs(new PrivilegedExceptionAction<Void>() {

            public Void run() throws Exception {
                 Configuration conf = new Configuration();

                 Job job = new Job(conf);

                 conf.set("hadoop.job.ugi", "root");

                 // write your remaining piece of code here. 

                return null;
            }
        });

    } catch (Exception e) {
        e.printStackTrace();
    }

}

Also while submitting a mapreduce job, it should copy your java classes with their dependent jars to hadoop cluster, where it execute mapreduce job.You can read more here.

So you need to create a runnable jar of your code (with main class Analysis in your case) with all dependent jar files inits manifest classpath. Then run your jar file from your commandline using

java -jar job-jar-with-dependencies.jar arguments

HTH!

这篇关于远程运行Hadoop作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆