远程运行Hadoop作业 [英] Running Hadoop Job Remotely

查看：163 发布时间：2018/5/31 19:51:48 hadoop

本文介绍了远程运行Hadoop作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试从集群外部运行MapReduce作业。

例如。 Hadoop集群正在Linux机器上运行。
我们有一个在Windows机器上运行的Web应用程序。
我们想从这个远程Web应用程序运行hadoop作业。
我们希望检索hadoop输出目录并将其显示为图形。

我们编写了以下代码片段：

  Configuration conf = new Configuration（）; 
 
工作职位=新职位（conf）; 
 
 conf.set（mapred.job.tracker，192.168.56.101:54311）; 
 
 conf.set（fs.default.name，hdfs：//192.168.56.101：54310）; 
 
 job.setJarByClass（Analysis.class）; 
 //job.setOutputKeyClass(Text.class）; 
 //job.setOutputValueClass(IntWritable.class）; 
 
 job.setMapperClass（Map.class）; 
 job.setReducerClass（Reduce.class）; 
 
 
 
 //job.set 
 
 job.setInputFormatClass（CustomFileInputFormat.class）; 
 job.setOutputFormatClass（TextOutputFormat.class）; 
 
 FileInputFormat.addInputPath（job，new Path（args [0]））; 
 FileOutputFormat.setOutputPath（job，new Path（args [1]））; 
 
 job.setMapOutputKeyClass（Text.class）; 
 job.setMapOutputValueClass（IntWritable.class）; 
 job.setOutputKeyClass（Text.class）; 
 job.setOutputValueClass（IntWritable.class）; 
 
 
 job.waitForCompletion（true）;

这是我们得到的错误。即使我们关闭了hadoop 1.1.2集群，错误仍然是一样的。

  14/03/07 00： 23:37 WARN util.NativeCodeLoader：无法为您的平台加载native-hadoop库......在适用的情况下使用builtin-java类
 14/03/07 00:23:37 ERROR security.UserGroupInformation：PriviledgedActionException as：用户原因：java.io.IOException：无法设置路径的权限：\tmp\hadoop -user\mapred\staging\user818037780\.staging为0700 
线程mainjava中的异常.io.IOException：未能在org.apache.hadoop.fs.FileUtil中设置路径的权限：\ tmp \hadoop-user\mapred\staging\user818037780\.staging为0700 
。 checkReturnValue（FileUtil.java:691）
在org.apache.hadoop.fs.FileUtil.setPermission（FileUtil.java:664）
在org.apache.hadoop.fs.RawLocalFileSystem.setPermission（RawLocalFileSystem。 java：514）
在org.apache.hadoop.fs.RawLocalFileSystem.mkdirs（RawLocalFileSystem.java:349 ）
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs（FilterFileSystem.java:193）
 at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir（JobSubmissionFiles.java:126）
在org.apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:942）
在org.apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:936）
 at java.security.AccessController.doPrivileged（Native方法）
位于javax.security.auth.Subject.doAs（Subject.java:396）
位于org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation .java：1190）
 at org.apache.hadoop.mapred.JobClient.submitJobInternal（JobClient.java:936）
 at org.apache.hadoop.mapreduce.Job.submit（Job.java:550 ）
 at org.apache.hadoop.mapreduce.Job.waitForCompletion（Job.java:580）
 at LineCounter.main（LineCounter.java:86）

解决方案

从远程系统运行时，应该以远程用户身份运行。你可以在你的主类中做如下工作：

  public static void main（String a []）{
 UserGroupInformation ugi 
 = UserGroupInformation.createRemoteUser（root）; 
 
 try {
 
 
 ugi.doAs（new PrivilegedExceptionAction< Void>（）{
 
 public void run（）throws Exception { 
 Configuration conf = new Configuration（）; 
 
 Job job = new Job（conf）; 
 
 conf.set（hadoop.job.ugi， root）; 
 
 //在这里写下你的剩余代码
 
 return null; 
} 
}）; 
 
} catch（Exception e）{
 e.printStackTrace（）; 
} 
 
}

在提交mapreduce作业时，它应该将你的java类与它们的依赖jar包复制到hadoop集群，在那里它执行mapreduce作业。你可以阅读更多 here 。

所以你需要创建一个可运行的jar包你的代码（在你的情况下有主类分析）与清单类路径中的所有相关jar文件。然后使用命令行运行你的jar文件：

java -jar job-jar -with-dependencies.jar arguments
HTH！

I am trying to run a MapReduce job from outside the cluster.

e.g. Hadoop cluster is running on Linux machines. We have one web application running on a Windows machine. We want to run the hadoop job from this remote web application. We want to retrieve the hadoop output directory and present it as a Graph.

We have written the following piece of code:
Configuration conf = new Configuration(); Job job = new Job(conf); conf.set("mapred.job.tracker", "192.168.56.101:54311"); conf.set("fs.default.name", "hdfs://192.168.56.101:54310"); job.setJarByClass(Analysis.class) ; //job.setOutputKeyClass(Text.class); //job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); //job.set job.setInputFormatClass(CustomFileInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.waitForCompletion(true);
And this is the error we get. Even if we shut down the hadoop 1.1.2 cluster, the error is still the same.
14/03/07 00:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/03/07 00:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:user cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at LineCounter.main(LineCounter.java:86)

解决方案
While running from a remote system, you should run as remote user. You can do it in your main class as follows:
public static void main(String a[]) { UserGroupInformation ugi = UserGroupInformation.createRemoteUser("root"); try { ugi.doAs(new PrivilegedExceptionAction<Void>() { public Void run() throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); conf.set("hadoop.job.ugi", "root"); // write your remaining piece of code here. return null; } }); } catch (Exception e) { e.printStackTrace(); } }
Also while submitting a mapreduce job, it should copy your java classes with their dependent jars to hadoop cluster, where it execute mapreduce job.You can read more here.

So you need to create a runnable jar of your code (with main class Analysis in your case) with all dependent jar files inits manifest classpath. Then run your jar file from your commandline using
java -jar job-jar-with-dependencies.jar arguments
HTH!

这篇关于远程运行Hadoop作业的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

远程运行Hadoop作业 [英] Running Hadoop Job Remotely

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

远程运行Hadoop作业 [英] Running Hadoop Job Remotely

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭