Hadoop在重新启动时删除MapReduce历史记录 [英] Hadoop removes MapReduce history when it is restarted

查看:204
本文介绍了Hadoop在重新启动时删除MapReduce历史记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TestDFSIO和TeraSort基准测试工具进行多项Hadoop测试。我基本上用不同数量的datanodes进行测试,以评估处理能力和数据节点可伸缩性的线性。

在上述过程中,我显然必须重启几次所有Hadoop环境。每次我重新启动Hadoop时,所有MapReduce作业都将被删除,作业计数器将再次从job_2013 * _0001开始。出于比较的原因,保持我之前启动的所有MapReduce作业对我来说非常重要。所以,我的问题是:

¿如何避免Hadoop在重新启动后删除所有MapReduce作业历史记录?
¿是否有一些属性可以在Hadoop环境重新启动后控制作业的移除?

谢谢!

解决方案

重新启动hadoop后,MR作业历史日志不会被正确删除,新作业将从* _0001开始计数,只有在hadoop重新启动后启动的新作业才会显示在资源管理器中网络门户虽然。实际上,有2个日志相关的设置来自纱线默认值

 #这是您可以在其中找到MR作业历史日志
纱线的位置。 nodemanager.log-dirs = $ {yarn.log.dir} / userlogs

#这是历史记录将被保留多久
yarn.nodemanager.log.retain-seconds = 10800

和默认的$ {yarn.log.dir}定义在$ HADOOP_HONE / etc / hadoop / yarn-env.sh。

  YARN_LOG_DIR =$ HADOOP_YARN_HOME / logs

顺便说一句,如果您使用Hadoop 1.X


,也可以在mapred-env.sh中找到类似的设置

I am carrying out several Hadoop tests using TestDFSIO and TeraSort benchmark tools. I am basically testing with different amount of datanodes in order to assess the linearity of the processing capacity and datanode scalability.

During the above mentioned process, I have obviously had to restart several times all Hadoop environment. Every time I restarted Hadoop, all MapReduce jobs are removed and the job counter starts again from "job_2013*_0001". For comparison reasons, it is very important for me to keep all the MapReduce jobs up that I have previously launched. So, my question is:

¿How can I avoid Hadoop removes all MapReduce-job history after it is restarted? ¿Is there some property to control job removing after Hadoop environment restarting?

Thanks!

解决方案

the MR job history logs are not deleted right way after you restart hadoop, the new job will be counted from *_0001 and only new jobs which are started after hadoop restart will be displayed on resource manager web portal though. In fact, there are 2 log related settings from yarn default:

# this is where you can find the MR job history logs
yarn.nodemanager.log-dirs = ${yarn.log.dir}/userlogs 

# this is how long the history logs will be retained
yarn.nodemanager.log.retain-seconds = 10800

and the default ${yarn.log.dir} is defined in $HADOOP_HONE/etc/hadoop/yarn-env.sh.

YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"

BTW, similar settings could also be found in mapred-env.sh if you are use Hadoop 1.X

这篇关于Hadoop在重新启动时删除MapReduce历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆