具有多个主机IP的流利的Apache日志格式 [英] Fluentd apache log format with multiple host ip
问题描述
我对fluend日志解析器有一点问题.我有一个清漆服务器,在该服务器上设置了 X-Forwarded-For 参数,以满足HTTP请求通过的所有主机堆栈的IP列表.我用它来获取 varnishncsa 日志中的信息.这是log的示例:
I have a little issue with fluend log parser. I have varnish server on which I have set up the X-Forwarded-For parameter to content the list of ip all the host stack a http request goes through. I use this to get information in varnishncsa logs. This is and example of log :
"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
我想用流畅的方式汇总这些日志.然后,随着vanishncsa日志使用apache格式,我使用 apache2 flentd格式进行输入解析,就像在这种配置中一样:
In the oder hand I would like to aggregate these logs on fluentd. Then as vanishncsa logs use the apache format, I use the apache2 flentd format for input parsing, like in this configuration :
<source>
type tail
format apache2
path /var/log/varnish/varnishncsa.log
pos_file /var/log/td-agent/tmp/access.log.pos
tag "apache2.varnish.mydomain.com.access"
</source>
现在的问题是,如果日志中只有一个主机ip,但是当有多个IP时,流利的聚合器会报告模式不匹配"警告.我的意思是
Now the problem is that this work when if I have only one host ip in the log, but when there multiple IPs, the fluentd agregator report a "pattern not match" warning. I mean
此匹配项:
"192.168.79.16 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
这不匹配:
"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""
apache2流利的正则表达式为:
The apache2 fluentd regex is :
^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
使用这种时间格式:
%d/%b/%Y:%H:%M:%S %z
我尝试找出正确的regx并将其发送文本,但尚未找到.
I try to find out and text the right regx for that, but not found yet.
我尝试了这个,但是,它不起作用
I tried this but, it doesn't work
<source>
type tail
format format /^(?<host>\,*[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
path /var/log/varnish/varnishncsa.log
pos_file /var/log/td-agent/tmp/access.log.pos
tag "apache2.varnish.mydomain.com.access"
</source>
有人可以帮忙吗?并且也为我提供了有关fluend解析器模式捕获的良好文档,以及一种测试有效的正则表达式的好方法. 有效的正则表达式编辑器并没有真正的帮助.
Can someone help? And also give me a good documentaion on fluend parser pattern capturing, and a good way to the test fulentd regex. This Fluentd regular expression editor doesn't really help.
它总是生成配置,而没有给出测试结果.
It always generate configuration, without giving a test result.
谢谢.
推荐答案
如果您有多个IP,这里是可以使用的正则表达式:
Here is the regex you can use in case you have multiple IPs:
^(?<host>[^ ]*(?:,\s+[^ ]+)*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
^^^^^^^^^^^^^^
(?:,\s+[^ ]+)*
模式与,
的0个或更多(*
)序列,1个或多个空白(\s+
)符号以及除空格([^ ]+
)以外的1个或更多字符匹配.
The (?:,\s+[^ ]+)*
pattern matches 0 or more (*
) sequences of ,
, 1 or more whitespace (\s+
) symbols, and 1 or more characters other than space ([^ ]+
).
更安全的表达方式如下:
A bit safer expression will look like:
^(?<host>(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*|-) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
请参见演示2
(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*
与number
+ .
+ number
+ .
+ number
+ .
+ number
匹配,并用逗号列出可选的相同模式.
The (?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*
matches number
+ .
+ number
+ .
+ number
+ .
+ number
, with optional identical patterns listed with a comma.
这篇关于具有多个主机IP的流利的Apache日志格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!