从eclipse启动mapreduce工作 [英] Launch a mapreduce job from eclipse

查看:139
本文介绍了从eclipse启动mapreduce工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用Java编写了一个mapreduce程序,我可以提交给以分布式模式运行的远程集群。目前,我使用以下步骤提交作业:


  1. 将mapreuce作业导出为jar(例如 myMRjob。 jar

  2. 使用以下shell命令将作业提交到远程集群: hadoop jar myMRjob.jar

当我尝试运行程序时,我想直接从Eclipse提交作业。我如何做到这一点?



我目前正在使用CDH3,而我的简短版本是:

  conf.set(hbase.zookeeper.quorum,getZookeeperServers()); 
conf.set(fs.default.name,hdfs:// namenode /);
conf.set(mapred.job.tracker,jobtracker:jtPort);
Job job = new Job(conf,COUNT ROWS);
job.setJarByClass(CountRows.class);

//设置Mapper
TableMapReduceUtil.initTableMapperJob(inputTable,scan,
CountRows.MyMapper.class,ImmutableBytesWritable.class,
ImmutableBytesWritable.class,job);

//设置Reducer
job.setReducerClass(CountRows.MyReducer.class);
job.setNumReduceTasks(16);

//设置总体输出
job.setOutputFormatClass(MultiTableOutputFormat.class);

job.submit();

当我从Eclipse直接运行这个工作时,该作业被启动,但Hadoop找不到mappers / reducers。我收到以下错误:

  12/06/27 23:23:29 INFO mapred.JobClient:map 0%reduce 0 %
12/06/27 23:23:37 INFO mapred.JobClient:任务ID:attempt_201206152147_0645_m_000000_0,状态:FAILED
java.lang.RuntimeException:java.lang.ClassNotFoundException:com.mypkg.mapreduce。 CountRows $ MyMapper
在org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
在org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
在org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
在org.apache.hadoop.mapred.Child $ 4.run(Child.java:270)
在java.security.AccessController.doPrivileged(本机方法)
在javax.security.auth.Subject.doAs( Subject.java:396)
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
在org.apache.hadoop.map red.Child.main(Child.java:264)
...

有没有人知道如何解决这些错误?如果我可以解决这个问题,我可以将更多的MR作业集成到我的脚本中,这将非常棒!

解决方案

来自Eclipse项目中的hadoop作业定义了作业的类,那么你最有可能有一个类路径问题。



job.setjarByClass CountRows.class)调用在构建类路径中找到类文件,而不是在CountRows.jar中(可能已经或可能尚未构建,甚至在类路径中)。



您应该可以通过打开 job.getJar()的结果在断开$ code> job.setjarByClass(..),如果没有显示一个jar文件路径,那么它会发现build类,而不是jar'd类


I've written a mapreduce program in Java, which I can submit to a remote cluster running in distributed mode. Currently, I submit the job using the following steps:

  1. export the mapreuce job as a jar (e.g. myMRjob.jar)
  2. submit the job to the remote cluster using the following shell command: hadoop jar myMRjob.jar

I would like to submit the job directly from Eclipse when I try to run the program. How can I do this?

I am currently using CDH3, and an abridged version of my conf is:

conf.set("hbase.zookeeper.quorum", getZookeeperServers());
conf.set("fs.default.name","hdfs://namenode/");
conf.set("mapred.job.tracker", "jobtracker:jtPort");
Job job = new Job(conf, "COUNT ROWS");
job.setJarByClass(CountRows.class);

// Set up Mapper
TableMapReduceUtil.initTableMapperJob(inputTable, scan, 
    CountRows.MyMapper.class, ImmutableBytesWritable.class,  
    ImmutableBytesWritable.class, job);  

// Set up Reducer
job.setReducerClass(CountRows.MyReducer.class);
job.setNumReduceTasks(16);

// Setup Overall Output
job.setOutputFormatClass(MultiTableOutputFormat.class);

job.submit();

When I run this directly from Eclipse, the job is launched but Hadoop cannot find the mappers/reducers. I get the following errors:

12/06/27 23:23:29 INFO mapred.JobClient:  map 0% reduce 0%  
12/06/27 23:23:37 INFO mapred.JobClient: Task Id :   attempt_201206152147_0645_m_000000_0, Status : FAILED  
java.lang.RuntimeException: java.lang.ClassNotFoundException:   com.mypkg.mapreduce.CountRows$MyMapper  
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)  
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)  
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)  
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)   
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at javax.security.auth.Subject.doAs(Subject.java:396)  
    at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)  
    at org.apache.hadoop.mapred.Child.main(Child.java:264)  
...

Does anyone know how to get past these errors? If I can fix this, I can integrate more MR jobs into my scripts which would be awesome!

解决方案

If you're submitting the hadoop job from within the Eclipse project that defines the classes for the job then you most probably have a classpath problem.

The job.setjarByClass(CountRows.class) call is finding the class file on the build classpath, and not in the CountRows.jar (which may or may not have been built yet, or even on the classpath).

You should be able to assert this is true by printing out the result of job.getJar() after you call job.setjarByClass(..), and if it doesn't display a jar filepath, then it's found the build class, rather than the jar'd class

这篇关于从eclipse启动mapreduce工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆