使用正则表达式解析日志 [英] Parse log with regular expression

查看:202
本文介绍了使用正则表达式解析日志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找解析 Varnish 日志文件的解决方案。它看起来像:

I'm looking for kind of solution for parsing the Varnish log file. It looks like:

178.232.38.87 - - [23/May/2012:14:01:05 +0200] "GET http://static.vg.no/iphone/js/front-min.js?20120509-1 HTTP/1.1" 200 2013 "http://touch.vg.no/" "Mozilla/5.0 (Linux; U; Android 2.3.3; en-no; HTC Nexus One Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"

可以区分以下元素:

%h%l%u%t% r%s%b%{Referer} i%{User-agent} i

但我仍然不知道这该怎么做。简单的 String.split(); 将无效。

but I still have no idea how to do this. Simple String.split(" "); won't work.

我知道正则表达式有一般规则,但最合适的是java。

I know regular expressions has general rules, but the most suitable would be java one.

谢谢

推荐答案

我想出了一种方法,可以根据各个字段的可能/期望值来匹配各个字段来构建正则表达式。

I'd come up with a way to build a regular expression from chunks matching the individual fields according to their possible/expected values.

    String rexa = "(\\d+(?:\\.\\d+){3})";  // an IP address
    String rexs = "(\\S+)";                // a single token (no spaces)
    String rexdt = "\\[([^\\]]+)\\]";      // something between [ and ]
    String rexstr = "\"([^\"]*?)\"";       // a quoted string
    String rexi = "(\\d+)";                // unsigned integer

    String rex = String.join( " ", rexa, rexs, rexs, rexdt, rexstr,
                              rexi, rexi, rexstr, rexstr );

    Pattern pat = Pattern.compile( rex );
    Matcher mat = pat.matcher( h );
    if( mat.matches() ){
        for( int ig = 1; ig <= mat.groupCount(); ig++ ){
            System.out.println( mat.group( ig ) );
        }
    }

当然,可以使用rex来代替rexa或rexi。

It is, of course, possible to make do with rexs in place of rexa or rexi.

这篇关于使用正则表达式解析日志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆