hadoop hdfs中的/ tmp目录是什么? [英] what is /tmp directory in hadoop hdfs?

查看:4005
本文介绍了hadoop hdfs中的/ tmp目录是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4个datanodes的集群,每个节点上的hdfs结构如下




我正面临磁盘空间问题,您可以看到 / tmp 文件夹来自hdfs已占用更多空间(217GB)。所以我试图调查来自 / tmp 文件夹的数据。我发现了以下临时文件。我访问了这些临时文件夹,每个文件夹都包含10GB到20GB的部分文件。
我想清除这个/ tmp目录。任何人都可以让我知道删除这些tmp文件夹或零件文件的后果。它会影响我的群集吗?



解决方案

HDFS / tmp目录主要用作mapreduce操作期间的临时存储。 Mapreduce工件,中间数据等将保存在该目录下。 mapreduce作业执行完成后,这些文件将自动清除。如果删除此临时文件,则可能会影响当前正在运行的mapreduce作业。

临时文件由猪创建。临时文件删除发生在最后。如果脚本执行失败或死亡,Pig不处理临时文件删除。那么你必须处理这种情况。你最好在脚本本身处理这个临时文件清理活动。

下面的文章给你很好的理解



http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/


I have cluster of 4 datanodes and hdfs structure on each node is as below

I am facing disk space issue , as you can see the /tmp folder from hdfs has occupied more space(217GB). So i tried to investigate the data from /tmp folder. I found following temp files. I accessed these temp folders each contains some part files of 10gb to 20 gb in size. I want to clear this /tmp directory. can anyone please let me know the consequences of deleting these tmp folders or part files. Will it affect my cluster?

解决方案

HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory. These files will be automatically cleared out when mapreduce job execution completes. If you delete this temporary files, it can affect the currently running mapreduce jobs.

Temporary files are created by pig. Temporary files deletion happens at the end. Pig does not handle temporary files deletion if the script execution failed or killed. Then you have to handle this situation. You better handle this temporary files clean up activity in the script itself.

Following article gives you a good understanding

http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/

这篇关于hadoop hdfs中的/ tmp目录是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆