从包含日期时间大于某事的日志文件中解析行 [英] Parsing lines from a log file containing date-time greater than something
问题描述
我有大约 100 MB 大小的日志文件,包含这样的行,开头包含日期时间信息:
I have log files of size of the order of several 100 MBs, containing lines like this, containing the date-time information in the beginning:
[Tue Oct 4 11:55:19 2016] [hphp] [25376:7f5d57bff700:279809:000001] []
Fatal error: syntax error, unexpected T_ENCAPSED_AND_WHITESPACE, expecting ')' in /var/cake_1.2.0.6311-beta/app/webroot/openx/www/delivery/postGetAd.php(12479)(62110d90541a84df30dd077ee953e47c) : eval()'d code on line 1
我有一个插件 (nagios check_logwarn) 可以只打印出那些包含一些错误字符串的行.以下是运行它的命令:
I have a plugin (nagios check_logwarn) to print out only those lines which contain some of the error strings. Following is the command to run it:
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*"
我想根据日期时间进一步过滤掉,即 11:55:10 之后的所有行.
I want to filter out further, based on the date-time, i.e., all the lines which are after, say, 11:55:10.
我不确定是否为此使用正则表达式.以下是我目前所拥有的:
I am not sure whether to use regex for this. Following is what I have so far:
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" | grep "15:19:1*"
但这只会过滤那些时间在第 15 小时的第 19 分钟的日志.
But this will only filter those logs whose time is in the 19th minute of the 15th hour.
更新
我现在可以比较日期时间的时间部分.
I am now able to compare the time part of the date-time.
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'
我如何比较当天的部分?
How do I compare the day part?
更新 2 - 开放赏金
我不得不开启赏金计划,因为我对 shell 没有太多专业知识,我很快需要一个解决方案.
I am having to open a bounty because I do not have much expertise with shell and I need a solution soon.
我被困在比较日期的部分.有了解决方案https://stackoverflow.com/a/39856560/351903,我面临着这个问题.如果那是固定的,我会很高兴.
I am stuck at the part of comparing the dates. With The solution https://stackoverflow.com/a/39856560/351903, I am facing this problem. If that is fixed, I would be happy.
我也愿意对此进行一些增强(我不介意输出是否有一些混乱的日志顺序)-
I am also open to some enhancement to this (I don't mind if the output has some jumbled up order of logs) -
/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'
我查找了一些日期时间与时间戳的比较,但找不到有效的方法.
I looked for some date-time to timestamp comparison, but couldn't find something working.
我无法从 这个问题.我看不到使用这个的时间戳值 -
I am not able to proceed from what is given in this question. I cannot see the timestamp value using this -
echo date -d '06/12/2012 07:21:22' +"%s"
不确定我错过了什么.
推荐答案
这使用了一个引用时间戳并将日志文件中的时间戳与其进行比较;如果日志文件的时间戳较新,则打印该行:
This uses a reference timestamp and compares the timestamp from the log file to it; if the log file's time stamp is more recent, the line gets printed:
awk -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" -F "[][]" '
{
cmd = "date + 47%s 47 -d "" $2 """
if ((cmd | getline val) > 0) {
if (val > refdate)
print
}
close(cmd)
}
' infile
这是它的工作原理:
-v refdate=$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')"
将给定的日期(我们的参考日期)转换为自纪元以来的秒数.-F "[][]"
将字段分隔符设置为方括号,因此我们想要的时间戳只是$2
."date + 47%s 47 -d ""$2 """
是我们要执行的shell命令;它变成date +'%s' -d "$2"
,即,它将日志文件时间戳转换为自纪元以来的秒数.