日志文件的有效grep [英] Effective grep of log file
问题描述
我有一个使用这种格式的很多行的日志文件:
I have a log file with a lot of lines on this format:
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:41:07.197Z] "DELETE /page/sub1.php?id=alice HTTP/1.1" 401 275 "-" "alice/7.61.1"
10.87.113.12 - - [2019-12-09T11:43:51.008Z] "POST /page/sub2.php?id=alice&jw_token=07e876afdc2245b53214fff0d4763730 HTTP/1.1" 200 275 "-" "alice/7.61.1"
我的目标很简单:我想输出爱丽丝的jw_token,就是这样.
My objective is simple: I want to output Alice's jw_token, and that's it.
所以,我的逻辑是我需要找到包含id=alice
和状态码200的行,然后返回jw_token
的值.
So, my logic is that I need to find the lines that include id=alice
and a status code of 200, then return the value of jw_token
.
我实际上设法做到了这一点,但是只有在这行绝对的怪异的情况下:
I actually managed to do this, but only with this absolute monstrosity of a line:
$ grep "id=alice" main.log | grep 200 | grep -o "n=.* " | sed "s/.*=//g" | sed "s/ .*$//g" | uniq
07e876afdc2245b53214fff0d4763730
这看起来很可怕,并且可能还会破坏很多东西(例如,如果"200"恰好出现在行中的其他任何地方).我知道grep -P
可能已经对其进行了一些清理,但是不幸的是该标志在我的Mac上不可用.
This looks horrible, and may also break on a number of things (for instance if "200" happens to appear anywhere else on the line). I know grep -P
could have cleaned it up somewhat, but unfortunately that flag isn't available on my Mac.
我也通过包含Python来做到这一点,就像这样:
I also did it by including Python, like this:
cat << EOF > analyzer.py
import re
with open('main.log') as f:
for line in f:
if "id=alice" in line and " 200 " in line:
print(re.search('(?<=jw_token\=).*?(?=\s)', line).group())
break
EOF
python3 analyzer.py && rm analyzer.py
(实际上比前一行使用grep
和sed
快了很多(数量级).为什么?)
(This was actually MUCH (orders of magnitude) faster than the previous line with grep
and sed
. Why?)
当然,有很多方法可以使它更干净,更漂亮.怎么样?
Surely there are ways to make this a lot cleaner and prettier. How?
推荐答案
您可以仅使用一个grep并使用此命令sed来实现此目的,
You can achieve this by using just one grep and sed with this command,
grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log|sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'|uniq
这里的第一部分grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log
将过滤掉所有没有爱丽丝且状态为200的行,而下一个sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'
部分将捕获组1中的令牌并将整个行仅替换为令牌.
Here first part grep -E 'id=alice&jw_token=.* HTTP\/1.1" 200' main.log
will filter out all lines not having alice and not having status 200 and next sed -E 's/.*id=alice&jw_token=([a-zA-Z0-9]+).*/\1/'
part will just capture the token in group1 and replace whole line with just the token.
这篇关于日志文件的有效grep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!