Hadoop/MR临时目录 [英] Hadoop/MR temporary directory
问题描述
我一直在努力让Hadoop和Map/Reduce开始使用单独的临时目录而不是根目录中的/tmp.
I've been struggling with getting Hadoop and Map/Reduce to start using a separate temporary directory instead of the /tmp on my root directory.
我已经在我的core-site.xml配置文件中添加了以下内容:
I've added the following to my core-site.xml config file:
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
我已将以下内容添加到我的mapreduce-site.xml配置文件中:
I've added the following to my mapreduce-site.xml config file:
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>${hadoop.tmp.dir}/mapred/staging</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
</property>
无论我执行什么工作,它都仍在/tmp目录中进行所有中间工作.我一直在看它通过df -h进行操作,当我进入那里时,它会创建所有临时文件.
No matter what job I run though, it's still doing all of the intermediate work out in the /tmp directory. I've been watching it do it via df -h and when I go in there, there are all of the temporary files it creates.
我在配置中缺少什么吗?
Am I missing something from the config?
这是在运行2.1.0.2.0.6.0 Hadoop/Yarn Mapreduce的10节点Linux CentOS群集上.
This is on a 10 node Linux CentOS cluster running 2.1.0.2.0.6.0 of Hadoop/Yarn Mapreduce.
经过一些进一步的研究,这些设置似乎可以在我的管理和namednode/secondarynamednodes框上使用.仅在数据节点上这是行不通的,并且仅与mapreduce临时输出文件一起仍在我的根驱动器上的/tmp上,而不是我在配置文件中设置的数据装载上.>
After some further research, the settings seem to be working on my management and namednode/secondarynamed nodes boxes. It is only on the data nodes that this is not working and it is only with the mapreduce temporary output files that are still going to /tmp on my root drive, not the my data mount where I have set in the configuration files.
推荐答案
如果运行的是Hadoop 2.0,则需要更改的配置文件的正确名称为mapred-site.xml
,而不是mapreduce-site.xml
.
If you are running Hadoop 2.0, then the proper name of the config file you need to change is mapred-site.xml
, not mapreduce-site.xml
.
An example can be found on the Apache site: http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
,它使用mapreduce.cluster.local.dir
属性名称,默认值为${hadoop.tmp.dir}/mapred/local
and it uses the mapreduce.cluster.local.dir
property name, with a default value of ${hadoop.tmp.dir}/mapred/local
尝试将mapreduce-site.xml
文件重命名为/etc/hadoop/conf/
目录中的mapred-site.xml
,看看是否可以解决该问题.
Try renaming your mapreduce-site.xml
file to mapred-site.xml
in your /etc/hadoop/conf/
directories and see if that fixes it.
如果您使用的是 Ambari ,则应该只需使用MapReduce2/自定义mapred-site.xml部分中的添加属性"按钮,输入"mapreduce.cluster.local" .dir"作为属性名称,并用逗号分隔要使用的目录列表.
If you are using Ambari, you should be able to just go to use the "Add Property" button on the MapReduce2 / Custom mapred-site.xml section, enter 'mapreduce.cluster.local.dir' for the property name, and a comma separated list of directories you want to use.
这篇关于Hadoop/MR临时目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!