Hadoop/MR临时目录 [英] Hadoop/MR temporary directory

查看:474
本文介绍了Hadoop/MR临时目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力让Hadoop和Map/Reduce开始使用单独的临时目录而不是根目录中的/tmp.

I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.

我已经在我的core-site.xml配置文件中添加了以下内容:

I've added the following to my core-site.xml config file:

<property>
    <name>hadoop.tmp.dir</name>
    <value>/data/tmp</value>
</property>

我已将以下内容添加到我的mapreduce-site.xml配置文件中:

I've added the following to my mapreduce-site.xml config file:

<property>
    <name>mapreduce.cluster.local.dir</name>
    <value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
    <name>mapreduce.jobtracker.system.dir</name>
    <value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
   <name>mapreduce.cluster.temp.dir</name>
   <value>${hadoop.tmp.dir}/mapred/temp</value>
</property>

无论我执行什么工作,它都仍在/tmp目录中进行所有中间工作.我一直在看它通过df -h进行操作,当我进入那里时,它会创建所有临时文件.

No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.

我在配置中缺少什么吗?

Am I missing something from the config?

这是在运行2.1.0.2.0.6.0 Hadoop/Yarn Mapreduce的10节点Linux CentOS群集上.

This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.

经过一些进一步的研究,这些设置似乎可以在我的管理和namednode/secondarynamednodes框上使用.仅在数据节点上这是行不通的,并且仅与mapreduce临时输出文件一起仍在我的根驱动器上的/tmp上,而不是我在配置文件中设置的数据装载上.

After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.

推荐答案

如果运行的是Hadoop 2.0,则需要更改的配置文件的正确名称为mapred-site.xml,而不是mapreduce-site.xml.

If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml, not mapreduce-site.xml.

可以在Apache网站上找到一个示例:

An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

,它使用mapreduce.cluster.local.dir属性名称,默认值为${hadoop.tmp.dir}/mapred/local

and it uses the mapreduce.cluster.local.dir property name, with a default value of ${hadoop.tmp.dir}/mapred/local

尝试将mapreduce-site.xml文件重命名为/etc/hadoop/conf/目录中的mapred-site.xml,看看是否可以解决该问题.

Try renaming your mapreduce-site.xml file to mapred-site.xml in your /etc/hadoop/conf/ directories and see if that fixes it.

如果您使用的是 Ambari ,则应该只需使用MapReduce2/自定义mapred-site.xml部分中的添加属性"按钮,输入"mapreduce.cluster.local" .dir"作为属性名称,并用逗号分隔要使用的目录列表.

If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.

这篇关于Hadoop/MR临时目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆