具有多个主机IP的流利的Apache日志格式 [英] Fluentd apache log format with multiple host ip

查看:126
本文介绍了具有多个主机IP的流利的Apache日志格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对fluend日志解析器有一点问题.我有一个清漆服务器,在该服务器上设置了 X-Forwarded-For 参数,以满足HTTP请求通过的所有主机堆栈的IP列表.我用它来获取 varnishncsa 日志中的信息.这是log的示例:

I have a little issue with fluend log parser. I have varnish server on which I have set up the X-Forwarded-For parameter to content the list of ip all the host stack a http request goes through. I use this to get information in varnishncsa logs. This is and example of log :

"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

我想用流畅的方式汇总这些日志.然后,随着vanishncsa日志使用apache格式,我使用 apache2 flentd格式进行输入解析,就像在这种配置中一样:

In the oder hand I would like to aggregate these logs on fluentd. Then as vanishncsa logs use the apache format, I use the apache2 flentd format for input parsing, like in this configuration :

<source>
  type tail
  format apache2
  path /var/log/varnish/varnishncsa.log
  pos_file /var/log/td-agent/tmp/access.log.pos
  tag "apache2.varnish.mydomain.com.access"
</source>

现在的问题是,如果日志中只有一个主机ip,但是当有多个IP时,流利的聚合器会报告模式不匹配"​​警告.我的意思是

Now the problem is that this work when if I have only one host ip in the log, but when there multiple IPs, the fluentd agregator report a "pattern not match" warning. I mean

此匹配项:

"192.168.79.16 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

这不匹配:

"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

apache2流利的正则表达式为:

The apache2 fluentd regex is :

^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$

使用这种时间格式:

%d/%b/%Y:%H:%M:%S %z

我尝试找出正确的regx并将其发送文本,但尚未找到.

I try to find out and text the right regx for that, but not found yet.

我尝试了这个,但是,它不起作用

I tried this but, it doesn't work

 <source>
      type tail
      format format /^(?<host>\,*[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/ 
      time_format %d/%b/%Y:%H:%M:%S %z
      path /var/log/varnish/varnishncsa.log
      pos_file /var/log/td-agent/tmp/access.log.pos
      tag "apache2.varnish.mydomain.com.access"
    </source>

有人可以帮忙吗?并且也为我提供了有关fluend解析器模式捕获的良好文档,以及一种测试有效的正则表达式的好方法. 有效的正则表达式编辑器并没有真正的帮助.

Can someone help? And also give me a good documentaion on fluend parser pattern capturing, and a good way to the test fulentd regex. This Fluentd regular expression editor doesn't really help.

它总是生成配置,而没有给出测试结果.

It always generate configuration, without giving a test result.

谢谢.

推荐答案

如果您有多个IP,这里是可以使用的正则表达式:

Here is the regex you can use in case you have multiple IPs:

^(?<host>[^ ]*(?:,\s+[^ ]+)*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
              ^^^^^^^^^^^^^^

请参见在优质的Web正则测试仪上进行演示

(?:,\s+[^ ]+)*模式与,的0个或更多(*)序列,1个或多个空白(\s+)符号以及除空格([^ ]+)以外的1个或更多字符匹配.

The (?:,\s+[^ ]+)* pattern matches 0 or more (*) sequences of ,, 1 or more whitespace (\s+) symbols, and 1 or more characters other than space ([^ ]+).

更安全的表达方式如下:

A bit safer expression will look like:

^(?<host>(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*|-) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$

请参见演示2

(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*number + . + number + . + number + . + number匹配,并用逗号列出可选的相同模式.

The (?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)* matches number + . + number + . + number + . + number, with optional identical patterns listed with a comma.

这篇关于具有多个主机IP的流利的Apache日志格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆