Logstash Grok过滤器每次匹配获取多个值 [英] Logstash Grok filter getting multiple values per match

查看:2043
本文介绍了Logstash Grok过滤器每次匹配获取多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一台服务器,它以自定义日志格式将访问日志发送到logstash,并正在使用logstash过滤这些日志并将其发送到Elastisearch.

I have a server that sends access logs over to logstash in a custom log format, and am using logstash to filter these logs and send them to Elastisearch.

日志行看起来像这样:

0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n

并使用此grok过滤器进行解析:

And gets parsed using this grok filter:

grok {
  match => [ 
    "message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
    "message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
    "message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
    "message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}"
  ]
  add_field => { 
    "protocol" => "HTTP"
  }
}

最终的日志将解析到该对象中(删除真实IP,并删除其他字段):

The final log gets parsed into this object (with real IPs stubbed out, and other fields taken out):

{
  "_source": {
    "message": " 0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n",
    "tags": [
      "bunyan"
    ],
    "@version": "1",
    "host": "0.0.0.0:0000",
    "remote_host": [
      "0.0.0.0",
      "0.0.0.0"
    ],
    "remote_user": [
      "-",
      "-"
    ],
    "method": [
      "GET",
      "GET"
    ],
    "requested_uri": [
      "/",
      "/"
    ],
    "status_code": [
      "200",
      "200"
    ],
    "content_length": [
      "29771",
      "29771"
    ],
    "elapsed_time": [
      "3",
      3
    ],
    "user_agent": [
      "ELB-HealthChecker/1.0",
      "ELB-HealthChecker/1.0"
    ],
    "protocol": [
      "HTTP",
      "HTTP"
    ]
  }
}

有什么主意,为什么每个日志都会有多个匹配项? Grok难道不应该在成功解析的第一场比赛中败北吗?

Any ideas why I am getting multiple matches per log? Shouldn't Grok be breaking on the first match that successfully parses?

推荐答案

可能您正在加载多个配置文件.如果查看输出,则elapsed_time特别显示为整数和字符串.在您提供的配置文件中,这是不可能的,因为在与elapsed_time匹配的任何内容上都具有:int.

Chances are you have multiple config files that are being loaded. If you look at the output, specifically the elapsed_time shows up as both an integer and a string. From the config file you've provided, that's not possible since you have :int on anything that matches elapsed_time.

这篇关于Logstash Grok过滤器每次匹配获取多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆