在HDFS上删除超过10天的文件 [英] Delete files older than 10days on HDFS

查看:1469
本文介绍了在HDFS上删除超过10天的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在Linux中,我会使用:

  find / path / to / directory / -type f -mtime +10 -name'* .txt'-execdir rm  -  {} \; 

有没有办法在HDFS上做到这一点? (根据文件创建日期删除)解决方案1:使用多个命令作为由daemon12回答

  hdfs dfs -ls / file / Path | tr -s| cut -d''-f6-8 | grep^ [0-9]| awk'BEGIN {MIN = 14400; LAST = 60 * MIN; 日期+%s| getline NOW} {cmd =date -d'\''$ 1$ 2'\''+%s; cmd | getline WHEN; DIFF = NOW-WHEN;如果(DIFF> LAST){打印删除:$ 3;系统(hdfs dfs -rm -r$ 3)}}'

解决方案2:
使用Shell脚本

  today =`date +'%s`` 
hdfs dfs -ls / file / Path / | grep^ d|而读线;
dir_date = $(echo $ {line} | awk'{print $ 6}')
difference = $((($ {today} - $(date -d $ {dir_date} +%s) ))/(24 * 60 * 60)))
filePath = $(echo $ {line} | awk'{print $ 8}')

if [$ {difference} -gt 10];那么
hdfs dfs -rm -r $ filePath
fi
完成


Is there a way to delete files older than 10 days on HDFS?

In Linux I would use:

find /path/to/directory/ -type f -mtime +10 -name '*.txt' -execdir rm -- {} \;

Is there a way to do this on HDFS? (Deletion to be done based on file creation date)

解决方案

Solution 1: Using multiple commands as answered by daemon12

hdfs dfs -ls /file/Path    |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

Solution 2: Using Shell script

today=`date +'%s'`
hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')

if [ ${difference} -gt 10 ]; then
    hdfs dfs -rm -r $filePath
fi
done

这篇关于在HDFS上删除超过10天的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆