logstash grok过滤器,用于自定义日志 [英] logstash grok filter for custom logs

查看:262
本文介绍了logstash grok过滤器,用于自定义日志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个相关的问题.首先是如何最好地处理具有混乱"间距等的日志,其二,我将分别询问,是如何处理具有任意属性值对的日志. (请参阅: logstash grok过滤器,用于具有任意属性的日志,值对)

I have two related questions. First is how best to grok logs that have "messy" spacing and so on, and the second, which I'll ask separately, is how to deal with logs that have arbitrary attribute-value pairs. (See: logstash grok filter for logs with arbitrary attribute-value pairs )

因此,对于第一个问题,我有一条类似于以下内容的日志行:

So for the first question, I have a log line that looks like this:

14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92

使用 http://grokdebug.herokuapp.com/我最终提出了适用于此行的以下grok模式:

Using http://grokdebug.herokuapp.com/ I was able to eventually come up with the following grok pattern that works for this line:

%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}

使用以下配置文件:

input {
        file {
                path => "/home/robyn/testlogs/trimmed_logs.txt"
                start_position => beginning
                sincedb_path => "/dev/null" # for testing; allows reparsing
        }
}
filter {
        grok {
                match => {"message" => "%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}" }
        }
}
output {
        file {
                path => "/home/robyn/filteredlogs/trimmed_logs.out.txt"
        }
}

我得到以下输出:

{"message":"14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92","@version":"1","@timestamp":"2015-08-07 T17:55:16.529Z","host":"hlt-dev","path":"/home/robyn/testlogs/trimmed_logs.txt","timestamp":"14:46:16.603","http":"[http-nio-8080-exec-4]","loglevel":"INFO","logtype":"METERING","msg":"93e6dd5e-c009-46b3-b9eb-f753ee3b889a","action":"CREATE_JOB","job":"a820018e-7ad7-481a-97b0-bd705c3280ad","data":"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"}

这几乎是我想要的,但是我觉得这是一个非常笨拙的模式,尤其是在需要大量使用%{SPACE}和%{NOSPACE}的情况下.这向我表明,我并没有真正做到最好.我应该为十六进制ID创建更具体的模式吗?我认为我需要在日志级别和日志类型之间使用%{SPACE},因为日志中的INFO和METERING之间有多余的空间,但这也让人感到困惑.

That's pretty much what I want, but I feel like it's a really kludgy pattern, particularly with the need to use %{SPACE} and %{NOSPACE} so much. This suggests to me that I'm not really doing this the best possible way. Should I be creating a more specific pattern for the hex ids? I think I need the %{SPACE} between loglevel and logtype because of the extra space between INFO and METERING in the log, but that also feels kludgy.

另外,我如何获取日志的时间戳来替换@timestamp,这似乎是logstash摄入日志的时间,而这是我们不需要/不需要的.

Also how do I get the log's timestamp to replace the @timestamp that seems to be the time logstash ingested the log, which we don't want/need.

很显然,我只是从ELK和grok入手,因此也对指向有用资源的指针表示赞赏.

Obviously I'm just getting started with ELK and grok, so pointers to useful resources are also appreciated.

推荐答案

您可以使用现有模式代替NOTSPACE,它是UUID.同样,当只有一个空格时,无需使用SPACE模式,您可以将其省略.我也使用USERNAME模式(可能是错误命名),只是为了捕获http字段.

There is an existing pattern you can use instead of NOTSPACE, it's UUID. Also when there's a single space, there's no need to use the SPACE pattern, you can leave it out. I'm also using the USERNAME pattern (maybe wrongly named) just for the sake of capturing the http field.

所以它会像这样,并且您只有一个SPACE模式来捕获多个空间.

So it would go like this and you only have a single SPACE pattern to capture multiple spaces.

示例日志行:

14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92

希腊模式:

%{TIME:timestamp} \[%{USERNAME:http}\] %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{UUID:msg} %{WORD:action} job=%{UUID:job} data=%{UUID:data}

Grok会吐出来:

{
  "timestamp": [
    [
      "14:46:16.603"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "46"
    ]
  ],
  "SECOND": [
    [
      "16.603"
    ]
  ],
  "http": [
    [
      "http-nio-8080-exec-4"
    ]
  ],
  "loglevel": [
    [
      "INFO"
    ]
  ],
  "SPACE": [
    [
      "  "
    ]
  ],
  "logtype": [
    [
      "METERING"
    ]
  ],
  "msg": [
    [
      "93e6dd5e-c009-46b3-b9eb-f753ee3b889a"
    ]
  ],
  "action": [
    [
      "CREATE_JOB"
    ]
  ],
  "job": [
    [
      "a820018e-7ad7-481a-97b0-bd705c3280ad"
    ]
  ],
  "data": [
    [
      "71b1652e-16c8-4b33-9a57-f5fcb3d5de92"
    ]
  ]
}

这篇关于logstash grok过滤器,用于自定义日志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆