Grok中的正则表达式有时会获取值,有时却无法 [英] Regexp in Grok sometimes catches a value sometimes not

查看:144
本文介绍了Grok中的正则表达式有时会获取值,有时却无法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在grok中有一个代码,该代码可以捕获消息,并且如果它们符合给定的条件,则它们会得到一个标签.

I've a code in grok, which captures messages, and if they meet a given criteria, they get a tag.

我的问题是,有时此过滤器在测试时有效,有时却无效.有问题的正则表达式如下:

My problem is, that sometimes this filter works while testing, and sometimes does not. The regexp in question is the following:

^(?!(?:\d\d\d\d-\d\d-\d\d.\d\d:\d\d:\d\d)).*$

此行检查给定消息是否不是以给定时间戳记格式开头.换句话说:如果给定的消息不是以该时间戳记开头的,那么它将获得一个标签.

This line checks if the given message does not begin with a given time stamp format. In other words: if the given message does not begin with this time stamp, then it gets a tag.

您可以使用此在线应用程序自己进行测试: http://grokconstructor.appspot.com/做/匹配#结果

You can test it yourself with this online application: http://grokconstructor.appspot.com/do/match#result

对于这些测试值,regepx捕获所有符合条件的消息,因此带有"test"的两行以绿色突出显示:

For these test values, the regepx captures all messages which meets the criteria, so the two lines with "test" are highlighted in green:

test
2016-09-23 18:26:49,714
2016-09-23 18:26:40,244
test

但是,当输入如下所示时,它将捕获第一个日期:

However it captures the first date when the input is something like this:

2016-09-23 18:26:49,714
2016-09-23 18:26:40,244
test

我想了解这种行为背后的原因是什么,我该如何预防呢?

I would like to understand what is the reason behind this behaviour, and how could I prevent it?

推荐答案

似乎是在某些消息的开头有一个BOM(字节顺序标记),我可以用Grok中的以下正则表达式捕获该材料: >

It appears to be that at the beginning of some messages there was a BOM (byte order mark) which I could capture with the following regexp in Grok:

^(?:\xEF\xBB\xBF).*&

我可以在剪贴板上保留此标记,但是看起来StackOwerflow可以将其剪下来,这就是为什么我的示例不适用于所有人的原因.

I could keep this mark on the clip board, but looks like StackOwerflow cuts it down, that's why my example didn't work for everyone.

这篇关于Grok中的正则表达式有时会获取值,有时却无法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆