日志文件的有效grep [英] Effective grep of log file

查看:89
本文介绍了日志文件的有效grep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用这种格式的很多行的日志文件:

I have a log file with a lot of lines on this format:

10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"

我的目标很简单:我想输出爱丽丝的jw_token,就是这样.

My objective is simple: I want to output Alice's jw_token, and that's it.

所以,我的逻辑是我需要找到包含id=alice和状态码200的行,然后返回jw_token的值.

So, my logic is that I need to find the lines that include id=alice and a status code of 200, then return the value of jw_token.

我实际上设法做到了这一点,但是只有在这行绝对的怪异的情况下:

I actually managed to do this, but only with this absolute monstrosity of a line:

$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730

这看起来很可怕,并且可能还会破坏很多东西(例如,如果"200"恰好出现在行中的其他任何地方).我知道grep -P可能已经对其进行了一些清理,但是不幸的是该标志在我的Mac上不可用.

This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.

我也通过包含Python来做到这一点,就像这样:

I also did it by including Python, like this:

cat << EOF > analyzer.py
import re

with open('main.log') as f:
    for line in f:
        if "id=alice" in line and " 200 " in line:
            print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
            break
EOF
python3 analyzer.py && rm analyzer.py

(实际上比前一行使用grepsed快了很多(数量级).为什么?)

(This was actually MUCH (orders of magnitude) faster than the previous line with grep and sed. Why?)

当然,有很多方法可以使它更干净,更漂亮.怎么样?

Surely there are ways to make this a lot cleaner and prettier. How?

推荐答案

您可以仅使用一个grep并使用此命令sed来实现此目的,

You can achieve this by using just one grep and sed with this command,

grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log|sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'|uniq

这里的第一部分grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log将过滤掉所有没有爱丽丝且状态为200的行,而下一个sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'部分将捕获组1中的令牌并将整个行仅替换为令牌.

Here first part grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log will filter out all lines not having alice and not having status 200 and next sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/' part will just capture the token in group1 and replace whole line with just the token.

这篇关于日志文件的有效grep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆