如何从 Eclipse 调试 hadoop mapreduce 作业? [英] How to debug hadoop mapreduce jobs from eclipse?

查看:33
本文介绍了如何从 Eclipse 调试 hadoop mapreduce 作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在单机、仅限本地的设置中运行 hadoop,我正在寻找一种不错的、轻松的方法来调试 eclipse 中的映射器和化简器.Eclipse 运行 mapreduce 任务没有问题.但是,当我去调试时,它给了我这个错误:

I'm running hadoop in a single-machine, local-only setup, and I'm looking for a nice, painless way to debug mappers and reducers in eclipse. Eclipse has no problem running mapreduce tasks. However, when I go to debug, it gives me this error :

12/03/28 14:03:23 警告 mapred.JobClient:未设置作业 jar 文件.可能找不到用户类.参见 JobConf(Class) 或 JobConf#setJar(String).

12/03/28 14:03:23 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

好的,所以我做了一些研究.显然,我应该使用 eclipse 的远程调试工具,并将其添加到我的 hadoop-env.sh 中:

Okay, so I do some research. Apparently, I should use eclipse's remote debugging facility, and add this to my hadoop-env.sh :

-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5000

我这样做了,我可以在 Eclipse 中单步执行我的代码.唯一的问题是,由于suspend=y",我无法从命令行使用hadoop"命令来执行诸如查看作业队列之类的操作;它挂了,我想象是因为它正在等待调试器附加.另外,当我处于这种模式时,我无法运行hbase shell",可能出于同样的原因.

I do that and I can step through my code in eclipse. Only problem is that, because of the "suspend=y", I can't use the "hadoop" command from the command line to do things like look at the job queue; it hangs, I'm imagining because it's waiting for a debugger to attach. Also, I can't run "hbase shell" when I'm in this mode, probably for the same reason.

所以基本上,如果我想在调试模式"和正常模式"之间来回切换,我需要更新 hadoop-env.sh 并重新启动我的机器.主要疼痛.所以我有几个问题:

So basically, if I want to flip back and forth between "debug mode" and "normal mode", I need to update hadoop-env.sh and restart my machine. Major pain. So I have a few questions :

  1. 是否有更简单的方法可以在 Eclipse 中调试 mapreduce 作业?

  1. Is there an easier way to do debug mapreduce jobs in eclipse?

为什么 eclipse 可以很好地运行我的 mapreduce 作业,但是为了调试我需要使用远程调试?

How come eclipse can run my mapreduce jobs just fine, but for debugging I need to use remote debugging?

有没有办法告诉 hadoop 对 mapreduce 作业使用远程调试,但对所有其他任务在正常模式下运行?(例如hadoop queue"hbase shell").

Is there a way to tell hadoop to use remote debugging for mapreduce jobs, but to operate in normal mode for all other tasks? (such as "hadoop queue" or "hbase shell").

是否有更简单的方法来切换 hadoop-env.sh 配置而无需重新启动我的机器?默认情况下,hadoop-env.sh 不可执行.

Is there an easier way to switch hadoop-env.sh configurations without rebooting my machine? hadoop-env.sh is not executable by default.

这是一个更普遍的问题:当我在仅本地模式下运行 hadoop 时到底发生了什么?我的机器上是否有任何进程始终开启"并执行 hadoop 作业?还是只有当我从命令行运行hadoop"命令时,hadoop 才会做一些事情?当我从 eclipse 运行 mapreduce 作业时,eclipse 在做什么?我必须在我的 pom.xml 中引用 hadoop-core 以使我的项目工作.eclipse 是将作业提交到我安装的 hadoop 实例,还是以某种方式从我的 maven 缓存中的 hadoop-core-1.0.0.jar 运行它?

This is a more general question : what exactly is happening when I run hadoop in local-only mode? Are there any processes on my machine that are "always on" and executing hadoop jobs? Or does hadoop only do things when I run the "hadoop" command from the command line? What is eclipse doing when I run a mapreduce job from eclipse? I had to reference hadoop-core in my pom.xml in order to make my project work. Is eclipse submitting jobs to my installed hadoop instance, or is it somehow running it all from the hadoop-core-1.0.0.jar in my maven cache?

这是我的主课:

public class Main {
      public static void main(String[] args) throws Exception {     
        Job job = new Job();
        job.setJarByClass(Main.class);
        job.setJobName("FirstStage");

        FileInputFormat.addInputPath(job, new Path("/home/sangfroid/project/in"));
        FileOutputFormat.setOutputPath(job, new Path("/home/sangfroid/project/out"));

        job.setMapperClass(FirstStageMapper.class);
        job.setReducerClass(FirstStageReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
}

推荐答案

/bin/hadoop (hadoop-env.sh) 脚本中进行更改.检查以查看已触发的命令.如果命令是jar,则只添加远程调试配置.

Make changes in /bin/hadoop (hadoop-env.sh) script. Check to see what command has been fired. If the command is jar, then only add remote debug configuration.

if [ "$COMMAND" = "jar" ] ; then
  exec "$JAVA" -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8999 $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
else
  exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
fi

这篇关于如何从 Eclipse 调试 hadoop mapreduce 作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆