如何使用 awk 轻松过滤日志? [英] How to filter logs easily with awk?

查看:67
本文介绍了如何使用 awk 轻松过滤日志?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个像这样的日志文件 mylog:

[01/Oct/2015:16:12:56 +0200] 错误编号 1[01/Oct/2015:17:12:56 +0200] 错误编号 2[01/Oct/2015:18:07:56 +0200] 错误编号 3[01/Oct/2015:18:12:56 +0200] 错误编号 4[02/Oct/2015:16:12:56 +0200] 错误编号 5[10/Oct/2015:16:12:58 +0200] 错误编号 6[10/Oct/2015:16:13:00 +0200] 错误编号 7[01/Nov/2015:00:10:00 +0200] 错误编号 8[01/Nov/2015:01:02:00 +0200] 错误编号 9[01/Jan/2016:01:02:00 +0200] 错误编号 10

我想找出 10 月 1 日 18 点到 11 月 1 日 1 点之间出现的那些行.也就是说,预期的输出是:

[01/Oct/2015:18:07:56 +0200] 错误编号 3[01/Oct/2015:18:12:56 +0200] 错误编号 4[02/Oct/2015:16:12:56 +0200] 错误编号 5[10/Oct/2015:16:12:58 +0200] 错误编号 6[10/Oct/2015:16:13:00 +0200] 错误编号 7[01/Nov/2015:00:10:00 +0200] 错误编号 8

我已经设法通过使用 match() 然后 mktime().第一个找到指定的模式,该模式存储在数组 a[] 中,因此可以访问它(有趣的是看到 glenn jackman 对 从线条模式访问捕获的组就是一个很好的例子).由于 mktime 需要格式 YYYY MM DD HH MM SS[ DST],所以我还必须将格式为 Xxx 的月份转换为数字,为此我使用 Ed Morton 的回答将月份从 Aaa 转换为 xx":awk '{printf "%02d ",(match("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'.

所有在一起,最后我在变量mytimestamp中得到了时间戳:

awk 'match($0,/([0-9]+)/([AZ][az]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, 一个) {天=一个[1];月=一个[2];年=a[3];小时=一个[4];min=a[5];秒=一个[6];utc=a[7];月=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);mydate=sprintf("%s %s %s %s %s %s %s", 年,月,日,时,分,秒,UTC);我的时间戳=mktime(我的日期)打印我的时间戳}' 我的日志

返回:

144370877614437123761443715676

等等

所以现在我已准备好根据给定日期进行转换.由于 awk 需要很多时间来处理这种格式,我更喜欢通过外部 shell 变量提供它们,使用 date -d"my date" +"%s" 打印时间戳:

start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"end="$(date -d"2015 年 11 月 1 日 01:00 +0200" +"%s")"

总的来说,这是可行的:

awk start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")" end="$(date -d"1 Nov 2015 01:00 +0200"+"%s")" '匹配($0,/([0-9]+)/([AZ][az]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {天=a[1];月=一个[2];年=a[3];小时=一个[4];min=a[5];秒=一个[6];utc=a[7];月=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);mydate=sprintf("%s %s %s %s %s %s %s", 年,月,日,时,分,秒,UTC);我的时间戳=mktime(我的日期);if (start<=mytimestamp && mytimestamp<=end) print}' mylog[01/Oct/2015:18:07:56 +0200] 错误编号 3[01/Oct/2015:18:12:56 +0200] 错误编号 4[02/Oct/2015:16:12:56 +0200] 错误编号 5[10/Oct/2015:16:12:58 +0200] 错误编号 6[10/Oct/2015:16:13:00 +0200] 错误编号 7[01/Nov/2015:00:10:00 +0200] 错误编号 8

但是,对于应该更直截了当的事情来说,这似乎是一项相当多的工作.尽管如此,man gawk 中时间函数"部分的介绍是

<块引用>

由于 AWK 程序的主要用途之一是处理日志文件包含时间戳信息,gawk 提供以下获取时间戳和格式化它们的函数.

所以我想知道:有没有更好的方法来做到这一点?例如,如果格式而不是 dd/Mmm/YYYY:HH:MM:ssdd Mmm YYYY HH:MM:ss 怎么办?难道不能在外部提供匹配模式,而不是每次发生这种情况时都必须更改它吗?我真的必须使用 match() 然后处理该输出以提供 mktime() 吗?gawk 不是提供了更简单的方法吗?

解决方案

使用 ISO 8601 时间格式!

<块引用>

但是,对于应该更直接的事情来说,这似乎是一项相当多的工作.

是的,这应该是直截了当的,之所以不是这样,是因为日志没有使用 ISO 8601.应用程序日志应使用 ISO 格式和 UTC 显示时间,其他设置应视为已损坏和修复.

您的请求应分为两部分.第一部分规范日志,将日期转换为 ISO 格式,第二部分进行研究:

awk '匹配($0,/([0-9]+)/([AZ][az]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {天=a[1]月=一个[2];年=a[3]小时=a[4]最小值=a[5]秒=a[6]utc=a[7];月=sprintf("%02d", (match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);myisodate=sprintf("%4d-%2d-%2dT%2d:%2d:%2d%6s", 年,月,日,时,分,秒,UTC);$1 = myisodate打印}' 我的日志

ISO 8601 日期的好处——除了它们是一个标准——是时间顺序与字典顺序一致,因此,您可以使用 /.../,/.../ 运算符来提取您感兴趣的日期.例如,查找 2015 年 10 月 1 日 18:00 +02002015 年 11 月 1 日 01:00 +0200 之间发生的事情,将以下过滤器附加到上一个标准化过滤器:

awk '/2015-10-01:18:00:00+0200/,/2015-11-01:01:00:00+0200/'

Suppose I have a log file mylog like this:

[01/Oct/2015:16:12:56 +0200] error number 1
[01/Oct/2015:17:12:56 +0200] error number 2
[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8
[01/Nov/2015:01:02:00 +0200] error number 9
[01/Jan/2016:01:02:00 +0200] error number 10

And I want to find those lines that occur between 1 Oct at 18.00 and 1 Nov at 1.00. That is, the expected output would be:

[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8

I have managed to convert the times to timestamp by using match() and then mktime(). First one finds the specified pattern, that is stored in the array a[] so it can be accessed (interesting to see glenn jackman's answer to access captured group from line pattern for a good example). Since mktime requires a format YYYY MM DD HH MM SS[ DST], I also have to convert the month in the form Xxx into a digit, for which I use an answer by Ed Morton to "convert month from Aaa to xx": awk '{printf "%02d ",(match("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'.

All together, finally I have the timestamp in the variable mytimestamp:

awk 'match($0, /([0-9]+)/([A-Z][a-z]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
        day=a[1]; month=a[2]; year=a[3];
        hour=a[4]; min=a[5]; sec=a[6]; utc=a[7];
        month=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
        mydate=sprintf("%s %s %s %s %s %s %s", year,month,day,hour,min,sec,utc);
        mytimestamp=mktime(mydate)
        print mytimestamp
    }' mylog

Returns:

1443708776
1443712376
1443715676

etc.

So now I am ready to convert against the given dates. Since awk takes a lot to handle such format, I prefer to provide them through an external shell variable, using date -d"my date" +"%s" to print the timestamp:

start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")"
end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")"

All together, this works:

awk start="$(date -d"1 Oct 2015 18:00 +0200" +"%s")" end="$(date -d"1 Nov 2015 01:00 +0200" +"%s")" 'match($0, /([0-9]+)/([A-Z][a-z]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {day=a[1]; month=a[2]; year=a[3]; hour=a[4]; min=a[5]; sec=a[6]; utc=a[7]; month=sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3); mydate=sprintf("%s %s %s %s %s %s %s", year,month,day,hour,min,sec,utc); mytimestamp=mktime(mydate); if (start<=mytimestamp && mytimestamp<=end) print}' mylog
[01/Oct/2015:18:07:56 +0200] error number 3
[01/Oct/2015:18:12:56 +0200] error number 4
[02/Oct/2015:16:12:56 +0200] error number 5
[10/Oct/2015:16:12:58 +0200] error number 6
[10/Oct/2015:16:13:00 +0200] error number 7
[01/Nov/2015:00:10:00 +0200] error number 8

However, this seems to be quite a bit of work for something that should be more straight forward. Nonetheless, the introduction of the "Time functions" section in man gawk is

Since one of the primary uses of AWK programs is processing log files that contain time stamp information, gawk provides the following functions for obtaining time stamps and formatting them.

So I wonder: is there any better way to do this? For example, what if the format instead of dd/Mmm/YYYY:HH:MM:ss was something like dd Mmm YYYY HH:MM:ss? Couldn't it be possible to provide the match pattern externally instead of having to change it every time this would happen? Do I really have to use match() and then process that output to then feed mktime()? Doesn't gawk provide a more simple way to do this?

解决方案

Use ISO 8601 time format!

However, this seems to be quite a bit of work for something that should be more straight forward.

Yes, this should be straightforward, and the reason why it is not, is because the logs do not use ISO 8601. Application logs should use ISO format and UTC to display times, other settings should be considered broken and fixed.

Your request should be split in two parts. The first part canonise the logs, converting dates to the ISO format, the second performs a research:

awk '
match($0, /([0-9]+)/([A-Z][a-z]{2})/([0-9]{4}):([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}) ([+-][0-9]{4})/, a) {
  day=a[1]
  month=a[2];
  year=a[3]
  hour=a[4]
  min=a[5]
  sec=a[6]
  utc=a[7];
  month=sprintf("%02d", (match("JanFebMarAprMayJunJulAugSepOctNovDec",month)+2)/3);
  myisodate=sprintf("%4d-%2d-%2dT%2d:%2d:%2d%6s", year,month,day,hour,min,sec,utc);
 $1 = myisodate
 print
}' mylog

The nice thing about ISO 8601 dates – besides them being a standard – is that the chronological order coincide with lexicographic order, therefore, you can use the /…/,/…/ operator to extract the dates you are interested in. For instance to find what happened between 1 Oct 2015 18:00 +0200 and 1 Nov 2015 01:00 +0200, append the following filter to the previous, standardising filter:

awk '/2015-10-01:18:00:00+0200/,/2015-11-01:01:00:00+0200/'

这篇关于如何使用 awk 轻松过滤日志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆