如何在awk中选择日期范围 [英] How to select date range in awk

查看:110
本文介绍了如何在awk中选择日期范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在制作一个实用程序,用于SSH到不同的服务器并收集所有错误日志,并将其发送给相关团队,该实用程序将使用awk收集日志文件并进行过滤。例如,

We are making a utility to ssh to different servers and collect all the error logs and send to the concerning teams this utility will cat the log file and filter using awk. e.g.

cat /app1/apache/tomcat7/logs/catalina.out | awk '$0>=from&&$0<=to' from="2019-02-01 12:00" to="2019-11-19 04:50"

我们正在数据库中保存上次加载的日期,并在下次运行中使用该日期作为起始日期。

We are saving dates in the database for last time loaded and using this date as from date in the next run.

给出的确认日期范围似乎仅适用于 yyyy-mm-dd HH:MM 日期格式。我们的日志文件具有不同的日期格式。例如

awk date range given seems to be only working with yyyy-mm-dd HH:MM date format. Our log files have different date formats. e.g.

EEE MMM dd yy HH:mm
EEE MMM dd HH:mm
yyyy-MM-dd hh:mm
dd MMM yyyy HH:mm:ss
dd MMM yyyy HH:mm:ss



问题



如何编写awk日期过滤器以处理日志文件中使用的任何日期格式?

Question

How can write awk date filter to work any date format used in log files?

我们无法在服务器上使用perl / python。要求是仅使用cat / awk / grep。

We cannot use perl/python on server. The requirement is to use only cat/awk/grep for this.

示例输入:

Sat Nov 02 13:07:48.005 2019 NA for id 536870914 in form Request
Tue Nov 05 13:07:48.009 2019 NA for id 536870914 in form Request
Sun Nov 10 16:29:22.122 2019 ERROR (1587): Unknown field ;  at position 177 (category)
Mon Nov 11 16:29:22.125 2019 ERROR (1587): Unknown field ;  at position 174 (category)
Tue Nov 12 07:59:48.751 2019 ERROR (1587): Unknown field ;  at position 177 (category)
Thu Nov 14 10:07:41.792 2019 ERROR (1587): Unknown field ;  at position 177 (category)
Sun Nov 17 08:45:22.210 2019 ERROR (1587): Unknown field ;  at position 174 (category)

命令和过滤器:

cat error.log |awk '$0>=from&&$0<=to' from="Nov 16 10:58" to="Nov 19 04:50"

预期输出:

Sun Nov 17 08:45:22.210 2019 ERROR (1587): Unknown field ;  at position 174 (category)


推荐答案

答案是awk不知道什么是日期。 Awk知道数字和字符串,并且只能比较它们。因此,当您要选择日期和时间时,必须确保比较的日期格式是可排序的,并且有很多格式:

The answer is that awk does not have any knowledge of what a date is. Awk knows numbers and strings and can only compare those. So when you want to select dates and times you have to ensure that the date-format you compare is sortable and there are many formats out there:

| type       | example                   | sortable |
|------------+---------------------------+----------|
| ISO-8601   | 2019-11-19T10:05:15       | string   |
| RFC-2822   | Tue, 19 Nov 2019 10:05:15 | not      |
| RFC-3339   | 2019-11-19 10:05:15       | string   |
| Unix epoch | 1574157915                | numeric  |
| AM/PM      | 2019-11-19 10:05:15 am    | not      |
| MM/DD/YYYY | 11/19/2019 10:05:15       | not      |
| DD/MM/YYYY | 19/11/2019 10:05:15       | not      |

因此,您必须将非可排序格式转换为可排序格式,主要是使用字符串操作。一个可以实现所需功能的模板awk程序在此处写下:

So you would have to convert your non-sortable formats into a sortable format, mainly using string manipulations. A template awk program that would achieve what you want is written down here:

# function to convert a string into a sortable format
function convert_date(str) {
    return sortable_date
}
# function to extract the date from the record
function extract_date(str) {
    return extracted_date
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }

大多数情况下,可以大大减少此程序。如果以上内容存储在 extract_date_range.awk 中,则可以将其运行为:

most of the time, this program can be heavily reduced. If the above is stored in extract_date_range.awk, you could run it as:

$ awk -f extract_date_range.awk begin="date-in-know-format" end="date-in-known-format" logfile

注意::以上假设单行登录。稍作改动,您就可以处理多行日志输入。

note: the above assumes single-line log-entries. With a minor adaptation, you can process multi-line log-entries.

在原始问题中,以下格式是呈现:

In the original problem, the following formats were presented:

EEE MMM dd yy HH:mm         # not sortable
EEE MMM dd HH:mm            # not sortable
yyyy-MM-dd hh:mm            # sortable
dd MMM yyyy HH:mm:ss        # not sortable

从上面可以轻松地将第二种格式之外的所有格式转换为可排序的格式。第二种格式错过了Year,我们必须利用Year来做详尽的检查。这是极其困难的,并且永远不会100%地证明。

From the above, all but the second format can be easily converted to a sortable format. The second format misses the Year by which we would have to do an elaborate check making use of the day of the week. This is extremely difficult and never 100% bullet proof.

除了第二种格式,我们可以编写以下函数:

Excluding the second format, we can write the following functions:

BEGIN {
    datefmt1="^[a-Z][a-Z][a-Z] [a-Z][a-Z][a-Z] [0-9][0-9] [0-9][0-9] [0-9][0-9]:[0-9][0-9]"
    datefmt3="^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]"
    datefmt4="^[0-9][0-9] [a-Z][a-Z][a-Z] [0-9][0-9][0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]"
}
# convert the range
(FNR==1) { t1 = convert_date(begin); t2 = convert_date(end) }
# extract the date from the record
{ date_string = extract_date($0) }
# skip if date string is empty
(date_string == "") { next }
# convert the date of the record
{ t = convert_date(date_string) }
# make the selection
(t1 <= t && t < t2) { print }

# function to extract the date from the record
function extract_date(str,    date_string) {
    date_string=""
    if (match(datefmt1,str)) { date_string=substr(str,RSTART,RLENGTH) }
    else if (match(datefmt3,str)) { date_string=substr(str,RSTART,RLENGTH) }
    else if (match(datefmt4,str)) { date_string=substr(str,RSTART,RLENGTH) }
    return date_string
}
# function to convert a string into a sortable format
# converts it in the format YYYYMMDDhhmmss
function convert_date(str, a,fmt, YYYY,MM,DD,T, sortable_date) {
    sortable_date=""
    if (match(datefmt1,str)) { 
        split(str,a,"[ ]")
        YYYY=(a[4] < 70 ? "19" : "20")a[4]
        MM=get_month(a[2]); DD=a[3]
        T=a[5]; gsub(/[^0-9]/,T)"00"
        sortable_date = YYYY MM DD T
    }
    else if (match(datefmt3,str)) { 
        sortable_date = str"00"
        gsub(/[^0-9]/,sortable_date)
    }
    else if (match(datefmt4,str)) { 
        split(str,a,"[ ]")
        YYYY=a[3]
        MM=get_month(a[2]); DD=a[1]
        T=a[4]; gsub(/[^0-9]/,T)"00"
        sortable_date = YYYY MM DD T
    }
    return sortable_date
}
# function to convert Jan->01, Feb->02, Mar->03 ... Dec->12
function get_month(str) {
   return sprintf("%02d",(match("JanFebMarAprMayJunJulAugSepOctNovDec",str)+2)/3)
}

这篇关于如何在awk中选择日期范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆