如何解析文本文件中的字符串模式并计算唯一条目? [英] How to parse a text file for string pattern and count unique entries?
问题描述
我有一个包含登录数据的日志文件,我需要生成一个报告,该报告总结所有失败的登录尝试并由用户进行组织.文件中的一行如下所示:
I have a logfile that contains login data, and I need to generate a report that summarizes all of the failed login attempts and organize it by the user. A line from the file looks like:
Jan 21 19:22:23 localhost sshd[1234]: Failed password for USER from 127.0.0.1 port 12345 ssh2 #IPs and such obscured, obviously
这是我需要计算和总结的那一行的USER
.模式始终为Failed password for USER
,这样有帮助,但是由于行中其他垃圾的数量,我无法执行awk -F
或其他字符串拆分的操作.
And it's the USER
from the line that I need to count and summarize. The pattern is always Failed password for USER
so that helps, but I can't do awk -F
or other string splitting stuff due to the amount of other junk on the line.
如何统计每个失败的登录并按用户总计?
How can I count each failed login and total them up per user?
推荐答案
使用GNU grep,请尝试以下操作:
With GNU grep, try this:
grep -Po "Failed password for \K.*?(?= from)" logfile.log | sort | uniq -c
-P
启用perl正则表达式,允许使用\K
之类的东西.
-o
仅打印匹配的部分,而不打印包含匹配项的整行.
\K
使grep忘记之前匹配的部分,因此它不会出现在输出中.
.*?
与USER匹配.仅打印此部分.
(?= from)
是确定USER何时结束所需的前瞻性操作.
-P
enables perl regexes, allowing for things like \K
.
-o
Prints only the matched part, instead of whole lines that contain a match.
\K
makes grep forget the part it matched before, so that it won't appear in the output.
.*?
matches USER. Only this part will be printed.
(?= from)
is a lookahead needed to determine when USER ends.
grep
部分在每次USER失败登录尝试时都会打印USER.现在,我们只需要计算每个用户的出现次数.这是用成语sort | uniq -c
完成的.
The grep
part prints USER for every failed login attempt of USER. Now we only need to count the occurrences for each user. This is done with the idiom sort | uniq -c
.
最终输出如下:
7 adam
2 bob
14 claire
输出按用户名排序.要按失败尝试次数排序,请在命令后附加| sort -nr
.
The output is sorted by user names. To sort by the number of failed attempts, append | sort -nr
to the command.
这篇关于如何解析文本文件中的字符串模式并计算唯一条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!