解析使用正则表达式的日志文件 [英] Parsing a log file with regular expressions
问题描述
我目前正在为我们的内部日志文件分析器(由log4php,log4net的log4j的和产生的)。到目前为止,我有一个漂亮的正则表达式来分析日志,除了一个恼人的:有些日志消息跨越多行,我不能得到正确匹配。正则表达式我现在是这样的:
(小于日期> \d {2} / {\d 2? ?} / \d {2})\s(小于时间> \d {2}):\d {2}:\d {2}),\d {3})\ S(<消息方式> +)
日志格式(我用于测试的解析器)是这样的:
08年7月23日14:17:31321日志
消息
$生成树b $ b多
线
08年7月23日14:17:在一行
31321的日志信息
当我现在运行解析器,我只得到该行日志开始于。如果我改变它跨越多行,我只得到一个结果(整个日志文件)
请帮助; - )
@samjudson:
的您需要传递在RegexOptions.Singleline标志正则表达式,从而使。匹配所有字符,除了新行不只是所有字符(这是默认值)。的
我试过了,但随后整个文件相匹配。我也试图消息组设置。+? (非贪婪),但后来它匹配单个字符(这是不是我要找的不是)。
现在的问题是,对于该模式消息上的日期,小组赛一样,所以当它不换行打破它只是那张和和。
我现在用这个表达式的消息组。它的工作原理,除非有其中一样的日志信息开头的日志信息的模式。
(?< ;消息>((?\d {2} / \d {2} / \d {2} \s\d {2}:\d {2}:\d {2 },\d {3} \s\ [\d {4} \]))+)
< DIV CLASS =h2_lin>解决方案
如果日志消息不包含在该行的开始日期这只会工作,但你可以尝试添加一个负先行断言了日期信息组中:
(小于日期> \d {2} / {\d 2 ?} / \d {2})\s(小于时间> \d {2}:\d {2}:\d {2},\d {3})\s( ?&所述;消息>(!(^ \d {2} / \d {2} /
\d {2}))+)
请注意,这需要使用RegexOptions.MultiLine标志。
I'm currently working on a parser for our internal log files (generated by log4php, log4net and log4j). So far I have a nice regular expression to parse the logs, except for one annoying bit: Some log messages span multiple lines, which I can't get to match properly. The regex I have now is this:
(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}):\d{2}:\d{2}),\d{3})\s(?<message>.+)
The log format (which I use for testing the parser) is this:
07/23/08 14:17:31,321 log
message
spanning
multiple
lines
07/23/08 14:17:31,321 log message on one line
When I run the parser right now, I get only the line the log starts on. If I change it to span multiple lines, I get only one result (the whole log file).
Help please ;-)
@samjudson:
You need to pass the RegexOptions.Singleline flag in to the regular expression, so that "." matches all characters, not just all characters except new lines (which is the default).
I tried that, but then it matches the whole file. I also tried to set the message-group to .+? (non-greedy), but then it matches a single character (which isn't what I'm looking for either).
The problem is that the pattern for the message matches on the date-group as well, so when it doesn't break on a new-line it just goes on and on and on.
I use this regex for the message group now. It works, unless there's a pattern IN the log message which is the same as the start of the log message.
(?<message>(.(?!\d{2}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[\d{4}\]))+)
This will only work if the log message doesn't contain a date at the beginning of the line, but you could try adding a negative look-ahead assertion for a date in the "message" group:
(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}:\d{2}:\d{2},\d{3})\s(?<message>(.(?!^\d{2}/\d{2}/
\d{2}))+)
Note that this requires the use of the RegexOptions.MultiLine flag.
这篇关于解析使用正则表达式的日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!