如何从HDFS中删除文件? [英] How to delete files from the HDFS?

查看:3166
本文介绍了如何从HDFS中删除文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚下载了Hortonworks沙箱虚拟机,里面有2.7.1版本的Hadoop。我通过使用

  hadoop fs -put / hw1 / * / hw1 

...命令。之后,我删除了添加的文件,由

  hadoop fs -rm / hw1 / * 

...命令,并在清理回收站之后,由

  hadoop fs -expunge 

...命令。但是在清理垃圾箱之后,DFS剩余空间没有变化。即使我可以看到数据真正从/ hw1 /和recyle bin中删除。我有 fs.trash.interval参数= 1



实际上,我可以将所有数据分成大块在 /hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2 文件夹中,这真的让我感到意外,因为我希望他们被删除。所以我的问题如何删除数据的方式,他们真的会被删除?经过几次添加和删除之后,我得到了耗尽的可用空间。 解决方案

你的问题在HDFS的基础之内。在HDFS(以及其他许多文件系统中)中,文件的物理删除不是最快的操作。由于HDFS是分布式文件系统,通常在已删除文件的不同服务器上复制至少3个副本,因此在删除文件的请求后,必须在后台删除每个副本(可能由不同硬盘上的多个块组成)。



Hadoop官方文档告诉我们以下内容:


删除文件
导致与文件关联的块被释放。请注意,
在用户删除文件
的时间与HDFS中相应增加空闲
空间的时间之间可能存在明显的时间延迟


I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using the

hadoop fs -put /hw1/* /hw1

...command. After it I am deleting the added files, by the

hadoop fs -rm /hw1/*

...command, and after it cleaning the recycle bin, by the

hadoop fs -expunge

...command. But the DFS Remaining space not changed after recyle bin cleaned. Even I can see that the data was truly deleted from the /hw1/ and the recyle bin. I have the fs.trash.interval parameter = 1.

Actually I can find all my data split in chunks in the /hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2 folder, and this is really surprises me, because I expect them to be deleted.

So my question how to delete the data the way that they really will be deleted? After few adding and deletion I got exhausted free space.

解决方案

Your problem is inside of the basis of HDFS. In HDFS (and in many other file systems) physical deleting of files isn't the fastest operations. As HDFS is distributed file system and usually replicate at least 3 replicas on different servers of the deleted file then each replica (which may consist of many blocks on different hard drives) must be deleted in the background after your request to delete the file.

Official documentation of Hadoop tells us the following:

The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

这篇关于如何从HDFS中删除文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆