尽管Hadoop正在运行,但它并未在作业追踪器中显示我的工作 [英] Hadoop is not showing my job in the job tracker even though it is running

查看:150
本文介绍了尽管Hadoop正在运行,但它并未在作业追踪器中显示我的工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:当我向hadoop 2.2.0群集提交作业时,它并未显示在作业跟踪器中,但作业成功完成。通过此I可以看到输出,并且它正在正确运行并在输出时显示输出。



我尝试了多个选项,但作业跟踪器没有看到该作业。如果我使用2.2.0 hadoop运行流作业,它会显示在任务跟踪器中,但是当我通过hadoop-client API提交它时,它不会显示在作业跟踪器中。我正在查看端口8088上的ui界面以验证作业 b b b OSX Mavericks,Java 1.6,Hadoop 2.2。 0单节点群集,Tomcat 7.0.47

代码

  try {
configuration.set(fs.defaultFS,hdfs://127.0.0.1:9000);
configuration.set(mapred.jobtracker.address,localhost:9001);

Job job = createJob(configuration);
job.waitForCompletion(true);
} catch(Exception e){
logger.log(Level.SEVERE,无法执行作业,e);
}

返回null;

etc / hadoop / mapred-site.xml

 <配置> 
<属性>
< name> mapreduce.framework.name< / name>
<值>纱线< /值>
< / property>

<属性>
<名称> mapred.job.tracker< / name>
< value> localhost:9001< /值>
< / property>
< / configuration>

etc / hadoop / core-site.xml

 <配置> 
<属性>
< name> hadoop.tmp.dir< / name>
<值> / tmp / hadoop - $ {user.name}< /值>
< description>其他临时目录的基础。< / description>
< / property>

<属性>
<名称> fs.default.name< /名称>
< value> hdfs:// localhost:9000< / value>
< / property>

< / configuration>


解决方案

解决问题的方法是用纱线的额外配置选项。我做了一个不正确的假设,即java hadoop-client api将使用配置目录中的配置选项。我可以通过使用log4j.properties为我的单元测试开启详细日志记录来诊断问题。它表明,这些工作是在当地进行的,并没有提交给纱线资源经理。通过一些试验和错误,我可以配置作业并将其提交给纱线资源管理器。

代码

  try {
configuration.set(fs.defaultFS,hdfs://127.0.0.1:9000);
configuration.set(mapreduce.jobtracker.address,localhost:54311);
configuration.set(mapreduce.framework.name,yarn);
configuration.set(yarn.resourcemanager.address,localhost:8032);

Job job = createJob(configuration);
job.waitForCompletion(true);
} catch(Exception e){
logger.log(Level.SEVERE,无法执行作业,e);
}


Problem: When I submit a job to my hadoop 2.2.0 cluster it doesn't show up in the job tracker but the job completes successfully. By this I can see the output and it is running correctly and prints output as it is running.

I have tried muliple options but the job tracker is not seeing the job. If I run a streaming job using the 2.2.0 hadoop it shows up in the task tracker but when I submit it via the hadoop-client api it does not show up in the job tracker. I am looking at the ui interface on port 8088 to verify the job

Environment OSX Mavericks, Java 1.6, Hadoop 2.2.0 single node cluster, Tomcat 7.0.47

Code

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapred.jobtracker.address", "localhost:9001");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

    return null;

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
    </property> 
</configuration>

etc/hadoop/core-site.xml

<configuration>
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/tmp/hadoop-${user.name}</value>
       <description>A base for other temporary directories.</description>
    </property>

    <property> 
      <name>fs.default.name</name> 
      <value>hdfs://localhost:9000</value> 
    </property>

</configuration>

解决方案

The resolution to the issue was to configure the job with the extra configuration options for yarn. I made int incorrect assumption that the java hadoop-client api would use the configuration options from the configuration directory. I was able to diagnose the problem by turning on verbose logging using log4j.properties for my unit tests. It showed that the jobs were running local and not being submitted to the yarn resource manager. With a little bit of trial and error I was able to configure the job and have it submitted to the yarn resource manager.

Code

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapreduce.jobtracker.address", "localhost:54311");
        configuration.set("mapreduce.framework.name", "yarn");
        configuration.set("yarn.resourcemanager.address", "localhost:8032");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

这篇关于尽管Hadoop正在运行,但它并未在作业追踪器中显示我的工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆