日志文件中的正则表达式匹配,返回匹配上方和下方的动态内容 [英] regexp match within a log file, return dynamic content above and below match

查看:35
本文介绍了日志文件中的正则表达式匹配,返回匹配上方和下方的动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些通用的日志文件,格式如下:

I have some catchall log files in a format as follows:

timestamp event summary
foo details
account name: userA
bar more details
timestamp event summary
baz details
account name: userB
qux more details
timestamp etc.

我想在日志文件中搜索 userB,如果找到,则从前面的时间戳回显到(但不包括)下面的时间戳.可能会有几个事件与我的搜索相匹配.最好在每个匹配项周围回显某种 --- start ------ end --- .

I would like to search the log file for userB, and if found, echo from the preceding timestamp down to (but not including) the following timestamp. There will likely be several events matching my search. It would be nice to echo some sort of --- start --- and --- end --- surrounding each match.

这对 pcregrep -M 来说是完美的,对吧?问题是,GnuWin32 的 pcregrep 因多行正则表达式搜索大文件而崩溃,而这些包罗万象的日志可能是 100 兆或更多.

This would be perfect for pcregrep -M, right? Problem is, GnuWin32's pcregrep crashes with multiline regexps searching large files, and these catch-all logs can be 100 megs or more.

我的尝试

到目前为止,我的 hackish 解决方法是使用 grep -B15 -A30 来查找匹配的行并打印周围的内容,然后将现在更易于管理的块通过管道传输到 pcregrep 中进行润色.问题是有些事件少于 10 行,而有些则是 30 行或更多;在遇到较短的事件时,我得到了一些意想不到的结果.

My hackish workaround thus far involves using grep -B15 -A30 to find matching lines and print surrounding content, then piping the now more manageable chunk into pcregrep for polishing. Problem is that some events are less than ten lines, while others are 30 or more; and I'm getting some unexpected results where the shorter events are encountered.

:parselog <username> <logfile>

set silent=1
set count=0
set deez=20dd-dd-dd dd:dd:dd
echo Searching %~2 for records containing %~1...

for /f "delims=" %%I in (
    'grep -P -i -B15 -A30 ":s+%~1(@mydomain.ext)?$" "%~2" ^| pcregrep -M -i "^%deez%(.|
)+?%~1(@mydomain.ext|
?
)(.|
)+?
%deez%" 2^>NUL'
) do (
    echo(%%I| findstr "^20[0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" >NUL && (
        if defined silent (
            set silent=
            set found=1
            set /a "count+=1"
            echo;
            echo ---------------start of record !count!-------------
        ) else (
            set silent=1
            echo ----------------end of record !count!--------------
            echo;
        )
    )
    if not defined silent echo(%%I
)

goto :EOF

有没有更好的方法来做到这一点?我遇到了一个看起来很有趣的 awk 命令,例如:

Is there a better way to do this? I've come across an awk command that looked interesting, something like:

awk "/start pattern/,/end pattern/" logfile

...但它也需要匹配一个中间模式.不幸的是,我不太熟悉 awk 语法.有什么建议吗?

... but it would need to match a middle pattern as well. Unfortunately, I'm not that familiar with awk syntax. Any suggestions?

Ed Morton 建议我提供一些示例日志记录和预期输出.

Ed Morton suggested that I supply some example logging and expected output.

示例包罗万象

2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730158    Mon Mar 25 08:02:28 2013    529 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Unknown user name or bad password

    User Name:  user5f

    Domain:     MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi  

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID:  400

    Transited Services: -

    Source Network Address: 169.254.7.86

    Source Port:    40838
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  USER6Q@MYDOMAIN.TLD

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi  

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
2013-03-25 08:02:32 Auth.Notice 169.254.5.62    Mar 25 08:36:38 DC4.mydomain.tld MSWinEventLog  5   Security    201326798   Mon Mar 25 08:36:37 2013    4624    Microsoft-Windows-Security-Auditing     N/A Audit Success   DC4.mydomain.tld    12544   An account was successfully logged on.

Subject:
    Security ID:        S-1-0-0
    Account Name:       -
    Account Domain:     -
    Logon ID:       0x0

Logon Type:         3

New Logon:
    Security ID:        S-1-5-21-606747145-1409082233-725345543-160838
    Account Name:       DEPTACCT16$
    Account Domain:     MYDOMAIN
    Logon ID:       0x1158e6012c
    Logon GUID:     {BCC72986-82A0-4EE9-3729-847BA6FA3A98}

Process Information:
    Process ID:     0x0
    Process Name:       -

Network Information:
    Workstation Name:   
    Source Network Address: 169.254.114.62
    Source Port:        42183

Detailed Authentication Information:
    Logon Process:      Kerberos
    Authentication Package: Kerberos
    Transited Services: -
    Package Name (NTLM only):   -
    Key Length:     0

This event is generated when a logon session is created. It is generated on the computer that was accessed.

The subject fields indicate...
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730162    Mon Mar 25 08:02:30 2013    675 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 9   Pre-authentication failed:

    User Name:  USER8Y

    User ID:        %{S-1-5-21-606747145-1409082233-725345543-3904}

    Service Name:   krbtgt/MYDOMAIN

    Pre-Authentication Type:    0x0

    Failure Code:   0x19

    Client Address: 169.254.87.158
2013-03-25 08:02:32 Auth.Critical   etc.

示例命令

call :parselog user6q \path	ocatch-all.log

预期结果

---------------start of record 1-------------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
---------------end of record 1-------------


---------------start of record 2-------------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITYSYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  USER6Q@MYDOMAIN.TLD

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi  

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
---------------end of record 2-------------

推荐答案

这就是您使用 GNU awk(用于 IGNORECASE)所需的一切:

This is all you need with GNU awk (for IGNORECASE):

$ cat tst.awk
function prtRecord() {
    if (record ~ regexp) {
        printf "-------- start of record %d --------%s", ++numRecords, ORS
        printf "%s", record
        printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS
    }
    record = ""
}
BEGIN{ IGNORECASE=1 }
/^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() }
{ record = record $0 ORS }
END { prtRecord() }

或任何 awk:

$ cat tst.awk
function prtRecord() {
    if (tolower(record) ~ tolower(regexp)) {
        printf "-------- start of record %d --------%s", ++numRecords, ORS
        printf "%s", record
        printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS
    }
    record = ""
}
/^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() }
{ record = record $0 ORS }
END { prtRecord() }

无论您在 UNIX 上以何种方式运行它:

Either way you'd run it on UNIX as:

$ awk -v regexp=user6q -f tst.awk file

我不知道 Windows 语法,但我希望它非常相似,如果不完全相同.

I don't know the Windows syntax but I expect it's very similar if not identical.

注意在脚本中使用 tolower() 使比较的两边都小写,因此匹配不区分大小写.如果您可以改为传入正确情况的搜索正则表达式,那么您无需在比较的任一侧调用 tolower().nbd,它可能只是稍微加快了脚本速度.

Note the use of tolower() in the script to make both sides of the comparison lower case so the match is case-insensitive. If you can instead pass in a search regexp that's the correct case, then you don't need to call tolower() on either side of the comparison. nbd, it might just speed the script up slightly.

$ awk -v regexp=user6q -f tst.awk file
-------- start of record 1 --------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security
    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITYSYSTEM N/A Audit Failure
dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
--------- end of record 1 ---------

-------- start of record 2 --------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security
    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITYSYSTEM N/A Audit Failure
dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  USER6Q@MYDOMAIN.TLD

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
--------- end of record 2 ---------

这篇关于日志文件中的正则表达式匹配,返回匹配上方和下方的动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆