使用shell脚本获取最近5分钟内在hadoop中创建的文件 [英] Get files which are created in last 5 minutes in hadoop using shell script

查看:1276
本文介绍了使用shell脚本获取最近5分钟内在hadoop中创建的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  drwxrwx ---  -  root supergroup 0 2016-08-19 06:21 / tmp / logs / root / logs / application_1464962104018_1639064 
drwxrwx --- root超级组0 2016-08-19 06:21 / tmp / logs / root / logs / application_1464962104018_1639065

现在 / tmp / logs / root / logs / 目录将不断获取新文件它。
我想获取最近五分钟内创建的文件,并考虑到当前时间。然后我需要将这些文件复制到本地机器中。

解决方案

如何操作:

  hdfs dfs -ls / tmp | tr -s| cut -d''-f6-8 | grep^ [0-9]| awk'BEGIN {MIN = 5; LAST = 60 * MIN; 日期+%s| getline NOW} {cmd =date -d'\''$ 1$ 2'\''+%s; cmd | getline WHEN; DIFF = NOW-WHEN; if(DIFF   

说明:



列出所有文件:


hdfs dfs -ls / tmp


替换多余空格:


tr -s


获取所需的列:


cut -d' '-f6-8


删除不需要的行:


grep^ [0-9]


使用awk处理:


awk


初始化DIFF持续时间和当前时间:


MIN = 5; LAST = 60 * MIN; 日期+%s| getline now


创建一个命令获取HDFS上文件时间戳的纪元值:


cmd =date -d'\''$ 1$ 2'\''+%s;


执行命令获取HDFS文件的纪元值:


cmd | getline WHEN;


获取时差:


DIFF = NOW-WHEN;


根据差异打印输出:


if(DIFF

您只需将 MIN 取决于您的要求(这里是5分钟)。
HTH


I have files in HDFS as:

drwxrwx---   - root supergroup          0 2016-08-19 06:21 /tmp/logs/root/logs/application_1464962104018_1639064
drwxrwx---   - root supergroup          0 2016-08-19 06:21 /tmp/logs/root/logs/application_1464962104018_1639065

Now /tmp/logs/root/logs/ directory will continuously get the new files in it. I want to get the files which are created in last five minutes, taking current time into account. Then I need to copy these files into my local machine.

解决方案

How about this:

hdfs dfs -ls /tmp | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=5; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF < LAST){ print $3 }}'

Explanation:

List all the files:

hdfs dfs -ls /tmp

Replace extra spaces:

tr -s " "

Get the required columns:

cut -d' ' -f6-8

Remove non-required rows:

grep "^[0-9]"

Processing using awk:

awk

Initialize the DIFF duration and current time:

MIN=5; LAST=60*MIN; "date +%s" | getline NOW

Create a command to get the epoch value for timestamp of the file on HDFS:

cmd="date -d'\''"$1" "$2"'\'' +%s";

Execute the command to get epoch value for HDFS file:

cmd | getline WHEN;

Get the time difference:

DIFF=NOW-WHEN;

Print the output depending upon the difference:

if(DIFF < LAST){ print $3 }

You just need to change the variable value for MIN depending upon your requirement (here its 5 minutes). HTH

这篇关于使用shell脚本获取最近5分钟内在hadoop中创建的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆