解析使用正则表达式的日志文件 [英] Parsing a log file with regular expressions

查看:162
本文介绍了解析使用正则表达式的日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在为我们的内部日志文件分析器(由log4php,log4net的log4j的和产生的)。到目前为止,我有一个漂亮的正则表达式来分析日志,除了一个恼人的:有些日志消息跨越多行,我不能得到正确匹配。正则表达式我现在是这样的:

 (小于日期> \d {2} / {\d 2? ?} / \d {2})\s(小于时间> \d {2}):\d {2}:\d {2}),\d {3})\ S(<消息方式> +)

日志格式(我用于测试的解析器)是这样的:

  08年7月23日14:17:31321日志
消息
$生成树b $ b多
线
08年7月23日14:17:在一行
31321的日志信息

当我现在运行解析器,我只得到该行日志开始于。如果我改变它跨越多行,我只得到一个结果(整个日志文件)



请帮助; - )






@samjudson:



您需要传递在RegexOptions.Singleline标志正则表达式,从而使。匹配所有字符,除了新行不只是所有字符(这是默认值)。



我试过了,但随后整个文件相匹配。我也试图消息组设置。+? (非贪婪),但后来它匹配单个字符(这是不是我要找的不是)。



现在的问题是,对于该模式消息上的日期,小组赛一样,所以当它不换行打破它只是那张和和。






我现在用这个表达式的消息组。它的工作原理,除非有其中一样的日志信息开头的日志信息的模式。

 (?< ;消息>((?\d {2} / \d {2} / \d {2} \s\d {2}:\d {2}:\d {2 },\d {3} \s\ [\d {4} \]))+)


< DIV CLASS =h2_lin>解决方案

如果日志消息不包含在该行的开始日期这只会工作,但你可以尝试添加一个负先行断言了日期信息组中:

 (小于日期> \d {2} / {\d 2 ?} / \d {2})\s(小于时间> \d {2}:\d {2}:\d {2},\d {3})\s( ?&所述;消息>(!(^ \d {2} / \d {2} / 
\d {2}))+)

请注意,这需要使用RegexOptions.MultiLine标志。


I'm currently working on a parser for our internal log files (generated by log4php, log4net and log4j). So far I have a nice regular expression to parse the logs, except for one annoying bit: Some log messages span multiple lines, which I can't get to match properly. The regex I have now is this:

(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}):\d{2}:\d{2}),\d{3})\s(?<message>.+)

The log format (which I use for testing the parser) is this:

07/23/08 14:17:31,321 log 
message
spanning
multiple
lines
07/23/08 14:17:31,321 log message on one line

When I run the parser right now, I get only the line the log starts on. If I change it to span multiple lines, I get only one result (the whole log file).

Help please ;-)


@samjudson:

You need to pass the RegexOptions.Singleline flag in to the regular expression, so that "." matches all characters, not just all characters except new lines (which is the default).

I tried that, but then it matches the whole file. I also tried to set the message-group to .+? (non-greedy), but then it matches a single character (which isn't what I'm looking for either).

The problem is that the pattern for the message matches on the date-group as well, so when it doesn't break on a new-line it just goes on and on and on.


I use this regex for the message group now. It works, unless there's a pattern IN the log message which is the same as the start of the log message.

(?<message>(.(?!\d{2}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[\d{4}\]))+)

解决方案

This will only work if the log message doesn't contain a date at the beginning of the line, but you could try adding a negative look-ahead assertion for a date in the "message" group:

(?<date>\d{2}/\d{2}/\d{2})\s(?<time>\d{2}:\d{2}:\d{2},\d{3})\s(?<message>(.(?!^\d{2}/\d{2}/
\d{2}))+)

Note that this requires the use of the RegexOptions.MultiLine flag.

这篇关于解析使用正则表达式的日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆