即使 Hadoop 正在运行,它也没有在作业跟踪器中显示我的作业 [英] Hadoop is not showing my job in the job tracker even though it is running

查看:46
本文介绍了即使 Hadoop 正在运行,它也没有在作业跟踪器中显示我的作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:当我向我的 hadoop 2.2.0 集群提交作业时,它没有显示在作业跟踪器中但作业成功完成.可以看到输出并且它正在正确运行并在运行时打印输出.

我尝试了多个选项,但作业跟踪器没有看到作业.如果我使用 2.2.0 hadoop 运行流式作业,它会显示在任务跟踪器中,但是当我通过 hadoop-client api 提交它时,它不会显示在作业跟踪器中.我正在查看 8088 端口上的 ui 界面以验证作业

环境OSX Mavericks、Java 1.6、Hadoop 2.2.0 单节点集群、Tomcat 7.0.47

代码

 试试 {configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");configuration.set("mapred.jobtracker.address", "localhost:9001");作业作业 = createJob(configuration);job.waitForCompletion(真);} 捕获(异常 e){logger.log(Level.SEVERE, "无法执行作业", e);}返回空;

etc/hadoop/mapred-site.xml

<预><代码><配置><财产><name>mapreduce.framework.name</name><value>纱线</value></属性><财产><name>mapred.job.tracker</name><value>localhost:9001</value></属性></配置>

etc/hadoop/core-site.xml

<预><代码><配置><财产><name>hadoop.tmp.dir</name><value>/tmp/hadoop-${user.name}</value><description>其他临时目录的基础.</description></属性><财产><name>fs.default.name</name><value>hdfs://localhost:9000</value></属性></配置>

解决方案

问题的解决方案是使用纱线的额外配置选项配置作业.我错误地假设 java hadoop-client api 将使用配置目录中的配置选项.我能够通过使用 log4j.properties 为我的单元测试打开详细日志记录来诊断问题.它表明作业在本地运行,没有提交给纱线资源管理器.通过一些试验和错误,我能够配置作业并将其提交给纱线资源管理器.

代码

 试试 {configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");configuration.set("mapreduce.jobtracker.address", "localhost:54311");configuration.set("mapreduce.framework.name", "yarn");configuration.set("yarn.resourcemanager.address", "localhost:8032");作业作业 = createJob(configuration);job.waitForCompletion(真);} 捕获(异常 e){logger.log(Level.SEVERE, "无法执行作业", e);}

Problem: When I submit a job to my hadoop 2.2.0 cluster it doesn't show up in the job tracker but the job completes successfully. By this I can see the output and it is running correctly and prints output as it is running.

I have tried muliple options but the job tracker is not seeing the job. If I run a streaming job using the 2.2.0 hadoop it shows up in the task tracker but when I submit it via the hadoop-client api it does not show up in the job tracker. I am looking at the ui interface on port 8088 to verify the job

Environment OSX Mavericks, Java 1.6, Hadoop 2.2.0 single node cluster, Tomcat 7.0.47

Code

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapred.jobtracker.address", "localhost:9001");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

    return null;

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
    </property> 
</configuration>

etc/hadoop/core-site.xml

<configuration>
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/tmp/hadoop-${user.name}</value>
       <description>A base for other temporary directories.</description>
    </property>

    <property> 
      <name>fs.default.name</name> 
      <value>hdfs://localhost:9000</value> 
    </property>

</configuration>

解决方案

The resolution to the issue was to configure the job with the extra configuration options for yarn. I made int incorrect assumption that the java hadoop-client api would use the configuration options from the configuration directory. I was able to diagnose the problem by turning on verbose logging using log4j.properties for my unit tests. It showed that the jobs were running local and not being submitted to the yarn resource manager. With a little bit of trial and error I was able to configure the job and have it submitted to the yarn resource manager.

Code

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapreduce.jobtracker.address", "localhost:54311");
        configuration.set("mapreduce.framework.name", "yarn");
        configuration.set("yarn.resourcemanager.address", "localhost:8032");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

这篇关于即使 Hadoop 正在运行,它也没有在作业跟踪器中显示我的作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆