Bash如何有效地操纵grep -Poz多行输出? [英] Bash How to efficiently manipulate a grep -Poz multiline output?

查看:158
本文介绍了Bash如何有效地操纵grep -Poz多行输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我关于stackoverflow的第一篇文章. \ 0/ 我希望不要太久. 我正在编写一个BASH脚本,以定期读取,过滤和输出来自数千个日志文件的数据.性能很重要,所以这就是为什么我主要使用grep而不是awk或sed的原因.

This is my first post on stackoverflow. \0/ I hope it's not too long of an entry. I'm writing a BASH script to regularly read, filter and output data from thousands of logfiles. Performance is important, so that's why I'm mainly using grep instead of awk or sed.

grep -Poz使用与进一步处理相关的模式来捕获(多行)数据时,正是我想要的,但是我只能将数据处理到例如XML文件或SQLite3批处理查询中进行进一步分析.

grep -Poz does exactly what I want in capturing the (multiline)data using patterns that's relevant for further processing, but I'm stuck in manipulating the data to, for example, an XML-file or a SQLite3 batch-query for further analysis.

#!/bin/bash
# Regex:
# (?s) multiline search
# Capturegroup 1 = date
# Capturegroup 2 = time
# Capturegroup 3 = error type (ERROR, WARN or DEBUG)
# Capturegroup 4 = error details
# Positive lookahed, until new line (windows/linux) starts with date, OR (if it's the last line matching the pattern, till the end of the last line.
#
REGEX_MULTILINE="(?s)([0-9]{4}-[0-9]{2}-[0-9]{2})[[:space:]]([0-9]{2}:[0-9]{2}:[0-9]{2}[,|.][0-9]{3})[[:space:]]+(ERROR|WARN|DEBUG)(.*?)(?=(?:\r\n|[\r\n])[0-9]{4}-[0-9]{2}-[0-9]{2}|\z)"
LOGFILE="test.log"

# write to logfile gives exactly the info I want
write_log(){
    echo -n $(grep -Pzo $REGEX_MULTILINE $LOGFILE) > output_grep1.txt
}

# I'm stuck in this part to generate, for example, an XML-file
write_xml(){
    local LOGDATE=""
    local LOGTIME=""
    local LOGTYPE=""
    local LOGINFO=""
    while IFS= read -r LINE ; do
    #For testing purposes, to see if brackets contain the full string, 
    #or a line of that string
    printf '%s\n' "[$LINE]"
    #processing logic here. Didn't get this far yet
    while [[ $LINE =~ $REGEX_MULTILINE ]] ; do
        # regex capturegoups
        LOGDATE=${BASH_REMATCH[1]}
        LOGTIME=${BASH_REMATCH[2]}
        LOGTYPE=${BASH_REMATCH[3]}
        LOGINFO=${BASH_REMATCH[4]}
        # send vars to function for output
        # write_xml_function $LOGDATE $LOGTIME $LOGTYPE $LOGINFO
        # for testing purposes
        echo -e "log entry:\n\t 1: $LOGDATE \n\t 2: $LOGTIME \n\t 3: $LOGTYPE \n\t 4: $LOGINFO \n" 
        break
    done
done < <(grep -Pzo $REGEX_MULTILINE $LOGFILE)
}

日志文件可能如下所示:

A logfile may look something like this:

2017-01-01 11:09:42,439 INFO  server.service.function.property.PropertyService - Props (re)loaded.
2017-01-01 11:15:46,155 DEBUG server.service.ApiController - api/start called! params:
${params}
2017-01-01 13:01:29,675 ERROR server.service.util.base.FtpClient - Error retrieving file. Directory does not exist.
2017-01-01 13:15:12,803 DEBUG server.service.ApiController - api/start called! params:
${params}
2017-01-01 13:15:13,932 INFO server.service.ControllerService - Filter:server.service.model.Filters
2017-01-01 15:36:04,914 INFO server.service.ControllerService - Filter:server.service.model.Filters
2017-01-01 15:55:50,279 ERROR server.service.WebClient - server API failed: [(someError.java:12345)]
{"someId":"etc","otherId":123,"token":{}}
2017-01-01 15:55:50,366 ERROR server.service.controller.Search - Server error for [/service/search/load]: java.lang.NullPointerException stack[etc]
java.lang.NullPointerException
    at server.common.stack(SomeApi.java:123)
    at server.service.trace(SomeService.java:456)
    at java.lang.Thread.run(Thread.java:789)
    etc.
    etc.
2017-01-01 16:17:55,175 DEBUG server.config.app - 

STARTING...


2017-01-01 16:18:00,040 INFO  server.common.service.base.property - Props (re)loaded.
2017-01-01 17:44:43,959 DEBUG server.service.controller - api/start called! params:
${params}

我期望读取grep多行字符串的结果是:

The result I expect in reading a grep multiline string is this:

[2017-01-01 13:15:13,932 INFO server.service.ControllerService - Filter:server.service.model.Filters]
[2017-01-01 15:36:04,914 INFO server.service.ControllerService - Filter:server.service.model.Filters]
[2017-01-01 15:55:50,279 ERROR server.service.WebClient - server API failed: [(someError.java:12345)]
{"someId":"etc","otherId":123,"token":{}}]
[2017-01-01 15:55:50,366 ERROR server.service.controller.Search - Server error for [/service/search/load]: java.lang.NullPointerException stack[etc]
java.lang.NullPointerException
    at server.common.stack(SomeApi.java:123)
    at server.service.trace(SomeService.java:456)
    at java.lang.Thread.run(Thread.java:789)
    etc.
    etc.]

相反,我得到了:

[2017-01-01 13:15:13,932 INFO server.service.ControllerService - Filter:server.service.model.Filters]
[2017-01-01 15:36:04,914 INFO server.service.ControllerService - Filter:server.service.model.Filters]
[2017-01-01 15:55:50,279 ERROR server.service.WebClient - server API failed: [(someError.java:12345)]
{"someId":"etc","otherId":123,"token":{}}]
[2017-01-01 15:55:50,366 ERROR server.service.controller.Search - Server error for [/service/search/load]: java.lang.NullPointerException stack[etc]]
[java.lang.NullPointerException]
[   at server.common.stack(SomeApi.java:123)]
[   at server.service.trace(SomeService.java:456)]
[   at java.lang.Thread.run(Thread.java:789)]
[   etc.]
[   etc.]

我忽略了什么?可以这样吗?

What did I overlook? Can it be done this way?

推荐答案

问题出在您的read命令上.默认情况下,read会读取到换行符为止,但是您正在尝试处理以null分隔的字符串.

The problem is with your read command. By default, read will read until a newline, but you are trying to process null-separated strings.

您应该可以使用

while IFS= read -r -d '' LINE ; do

这篇关于Bash如何有效地操纵grep -Poz多行输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆