如果日志包含特定单词,则忽略并移至下一个模式 [英] Ignore and move to next pattern if log contains a specific word

查看:96
本文介绍了如果日志包含特定单词,则忽略并移至下一个模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自spring日志文件的日志文件.日志文件具有三种格式.前两种格式中的每一种都是一行,如果有关键字app-info,则它们之间是单行,这是由自己的开发人员打印的消息.如果否,则由spring框架打印.我们可能会把开发者的信息与Spring框架的信息区别开来.第三种格式是多行堆栈跟踪.

I have a log file which comes from spring log file. The log file has three formats. Each of the first two formats is a single line, between them if there is keyword app-info, it is the message printed by own developer. If no, it is printed by spring framework. We may treat developers message different from spring framework ones. The third format is a multiline stack trace.

我们有一个自己格式的示例,例如

We have an example for our own format, for example

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

上面一行具有app-info键,因此是我们自己的开发人员.

The above line has app-info key works, so it is our own developers'.

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

以上行没有app-info关键字,因此由spring框架打印.

The above line has not app-info keyword, so it is printed by spring framework.

在我的Grok过滤器中,第一个模式用于从spring框架打印的消息,第二个模式用于开发人员的消息,第三个格式用于多行stacktrace.我想首先在正则表达式中清楚地提到spring框架模式没有关键字app-info,因此它可以得到paserexception并遵循开发人员自己的格式的第二种模式.因此,我在 regex工具中使用了以下格式,但是出现编译错误.我的正则表达式如下:

In my Grok filter, The first pattern is for messages printed from spring framework, the second is for developers' message, the third format is for multiline stacktrace. I want to first regex clearly mention that spring framework pattern does not have key word app-info so that it could get paserexception and follow the second pattern which is developers own format. So I have following formats in regex tool, but I got compile error. My regex is as follows:

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^((?app-info).)*\s\.\w\-\'\:\d\[\]\/]+)

由于在Grok过滤器中,我使用来自

since in Grok filter, I use instruction from this link

filter {
   grok {
     match => [ "message", "PATTERN1", "PATTERN2" , "PATTERN3" ]
    }
}

我目前在logstash中的配置如下,在模式中没有明确提及app-info:

My current configure in logstash is as follows which does not mention app-info clearly in the pattern:

filter {
  grok {
    match => [
      "message",
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[\s\.\w\-\'\:\d\[\]\/^[app-info]]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s(?<appinfo>app-info)\s-\s(?<systemmsg>[\w\d\:\{\}\,\-\(\)\s\"]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\w\-\d]+)\]\s-\s(?<loglevel>[\w]+)\s\-\s(?<appinfo>app-info)\s-\s(?<params>params):(?<jsonstr>[\"\w\d\,\:\.\{\}]+)\s(?<exceptionname>[\w\d\.]+Exception):\s(?<exceptiondetail>[\w\d\.]+)\n\t(?<extralines>at[\s\w\.\d\~\?\n\t\(\)\_\[\]\/\:\-]+)\n\d'

    ]      
  }

}

使用

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

第一个模式(spring框架模式)已经可以使用,因此它不属于我们自己的开发人员格式的第二个模式.解析器已成功解析,如下所示:

The first pattern(spring framework pattern) already works, so it does not fall into second pattern which is our own developers format. The parser has parsered successfully as follows:

  {
  "timestamp": [
    [
      "2018-04-27 10:42:49"
    ]
  ],
  "threadname": [
    [
      "http-nio-8088-exec-1"
    ]
  ],
  "loglevel": [
    [
      "INFO"
    ]
  ],
  "systemmsg": [
    [
      "app-info - injectip ip 192.168.16.89\n\n"
    ]
  ]
}

任何暗示我可以让第一个模式清楚地提到systemmsg不应包含关键字"app-info"吗?

Any hints I could let first pattern clearly mention that systemmsg shall not contain key word "app-info"?

我的目标是,如果没有关键字app-info,则让模式1处理日志.如果有关键字app-info,则让模式2处理日志.

My goal is that if there is no key word app-info, I let pattern 1 to handle the log. If there is key word app-info, I let pattern 2 to handle the log.

以下日志中不包含关键字app-info(模式1应该有效)

With following log which does not contains key word app-info (pattern 1 shall works),

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

我得到的结果与您提出的建议修改的第一个模式不匹配,这不是我的目标.

I got following result no match with first pattern modified following your suggestion, which is not my goal.

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^(?:(?!app\-info).)*\s\.\w\-\'\:\d\[\]\/]+)

请参见演示.我的目标是提取时间戳,线程名称,日志级别和系统消息.但是第一种模式并没有给我预期的结果.该工具说没有匹配项.

see demo. My goal is to extract timestamp, thread name, log level and system msg. But first pattern does not give me the expected result. The tool say there is no match.

如果我删除^(?:( ?! app-info).)*,则上面的log(无关键字app-info)解析器有效.参见演示 但是现在,它也适用于包含不期望的关键字app-info的日志,因为现在我要提取时间戳记,线程名,日志级别,app-info(存在或不存在)(应提取或分组字段),然后是systemmsg.期望第一个解析器返回错误,让第二个解析器处理日志. 演示可以看到解析器也可用于带有关键字app-info的日志. Systemmsg将字段app-info放入其值,这是不期望的.

if I remove ^(?:(?!app-info).)*, then above log(without key word app-info) parser works. See demo But now, It also works for log which contains key word app-info which is not expected, since now I want to extract timestamp, threadname, loglevel,app-info(exist or not)(the field shall be extracted or grouped), then systemmsg. The expectation is that the first parser returns error, let second parser to handle the log. demo could see the parser also works for log with key word app-info. Systemmsg put field app-info into its value which is not expected.

所以我要模式1处理不带关键字app-info的日志,模式2处理不带关键字app-info的日志.因此,当模式1包含关键字app-info时,我显然让模式1引发解析错误或异常.

So I want pattern 1, handles log without keyword app-info, pattern 2 handles log with keyword app-info. So I clearly let pattern 1 throw parse error or exception when it contains key word app-info.

推荐答案

我的目标是让模式1处理没有关键字app-info的日志.如果 有app-info,第一个模式将引发解析错误,因此 第二个解析器可以处理日志.

My goal is let pattern 1 handles log without keyword app-info. If there is app-info, the first pattern shall throw parse error, so that the second parser could handle the log.

您可以将以下内容用作第一个模式,

You can use the following as your first pattern,

(?<data>^(?!.*app-info).*)%{LOGLEVEL:log}%{DATA:other_data}%{IP:ip}$

它将执行的操作是,如果日志中的任意位置有app-info,它将忽略该日志,然后移至2nd PATTERN.

What it will do is, it will ignore the log if there is app-info in it at any position, and move to the 2nd PATTERN.

不带app-info的日志,

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89

您可以根据需要对其进行过滤.

You can filter it as per your requirements.

输出

{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - "
    ]
  ],
  "log": [
    [
      "INFO"
    ]
  ],
  "other_data": [
    [
      "  injectip ip "
    ]
  ],
  "ip": [
    [
      "192.168.16.89"
    ]
  ]
}

现在使用app-info记录日志,

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO app-info  injectip ip 192.168.16.89

输出

No Matches

在此处进行测试

如果您使PATTERN1等于(?<data>^(?!.*app-info).*)

您将得到

{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89"
    ]
  ]
}

然后,您可以为data字段添加第二个grok过滤器,如下所示,

you can then add a 2nd grok filter for the data field as follows,

grok {
  match => {"data" => "DEFINE PATTERN HERE"}
}

这篇关于如果日志包含特定单词,则忽略并移至下一个模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆