从 eclipse 启动 mapreduce 作业 [英] Launch a mapreduce job from eclipse

查看:30
本文介绍了从 eclipse 启动 mapreduce 作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用 Java 编写了一个 mapreduce 程序,我可以将其提交到以分布式模式运行的远程集群.目前,我使用以下步骤提交作业:

I've written a mapreduce program in Java, which I can submit to a remote cluster running in distributed mode. Currently, I submit the job using the following steps:

  1. 将 mapreuce 作业导出为 jar(例如 myMRjob.jar)
  2. 使用以下 shell 命令将作业提交到远程集群:hadoop jar myMRjob.jar

当我尝试运行程序时,我想直接从 Eclipse 提交作业.我该怎么做?

I would like to submit the job directly from Eclipse when I try to run the program. How can I do this?

我目前使用的是 CDH3,我的 conf 的删节版是:

I am currently using CDH3, and an abridged version of my conf is:

conf.set("hbase.zookeeper.quorum", getZookeeperServers());
conf.set("fs.default.name","hdfs://namenode/");
conf.set("mapred.job.tracker", "jobtracker:jtPort");
Job job = new Job(conf, "COUNT ROWS");
job.setJarByClass(CountRows.class);

// Set up Mapper
TableMapReduceUtil.initTableMapperJob(inputTable, scan, 
    CountRows.MyMapper.class, ImmutableBytesWritable.class,  
    ImmutableBytesWritable.class, job);  

// Set up Reducer
job.setReducerClass(CountRows.MyReducer.class);
job.setNumReduceTasks(16);

// Setup Overall Output
job.setOutputFormatClass(MultiTableOutputFormat.class);

job.submit();

当我直接从 Eclipse 运行它时,作业已启动,但 Hadoop 找不到映射器/减速器.我收到以下错误:

When I run this directly from Eclipse, the job is launched but Hadoop cannot find the mappers/reducers. I get the following errors:

12/06/27 23:23:29 INFO mapred.JobClient:  map 0% reduce 0%  
12/06/27 23:23:37 INFO mapred.JobClient: Task Id :   attempt_201206152147_0645_m_000000_0, Status : FAILED  
java.lang.RuntimeException: java.lang.ClassNotFoundException:   com.mypkg.mapreduce.CountRows$MyMapper  
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)  
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)  
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)  
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)   
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at javax.security.auth.Subject.doAs(Subject.java:396)  
    at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)  
    at org.apache.hadoop.mapred.Child.main(Child.java:264)  
...

有谁知道如何克服这些错误?如果我能解决这个问题,我就可以将更多的 MR 作业集成到我的脚本中,这会很棒!

Does anyone know how to get past these errors? If I can fix this, I can integrate more MR jobs into my scripts which would be awesome!

推荐答案

如果您从定义作业类的 Eclipse 项目中提交 hadoop 作业,那么您很可能会遇到类路径问题.

If you're submitting the hadoop job from within the Eclipse project that defines the classes for the job then you most probably have a classpath problem.

job.setjarByClass(CountRows.class) 调用正在构建类路径上查找类文件,而不是在 CountRows.jar(可能尚未构建,甚至在类路径上).

The job.setjarByClass(CountRows.class) call is finding the class file on the build classpath, and not in the CountRows.jar (which may or may not have been built yet, or even on the classpath).

您应该能够通过在调用 job.setjarByClass(..) 后打印出 job.getJar() 的结果来断言这是真的,如果它不显示 jar 文件路径,然后它找到了构建类,而不是 jar 类

You should be able to assert this is true by printing out the result of job.getJar() after you call job.setjarByClass(..), and if it doesn't display a jar filepath, then it's found the build class, rather than the jar'd class

这篇关于从 eclipse 启动 mapreduce 作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆