Grok中的正则表达式有时会获取值,有时却无法 [英] Regexp in Grok sometimes catches a value sometimes not
问题描述
我在grok中有一个代码,该代码可以捕获消息,并且如果它们符合给定的条件,则它们会得到一个标签.
I've a code in grok, which captures messages, and if they meet a given criteria, they get a tag.
我的问题是,有时此过滤器在测试时有效,有时却无效.有问题的正则表达式如下:
My problem is, that sometimes this filter works while testing, and sometimes does not. The regexp in question is the following:
^(?!(?:\d\d\d\d-\d\d-\d\d.\d\d:\d\d:\d\d)).*$
此行检查给定消息是否不是以给定时间戳记格式开头.换句话说:如果给定的消息不是以该时间戳记开头的,那么它将获得一个标签.
This line checks if the given message does not begin with a given time stamp format. In other words: if the given message does not begin with this time stamp, then it gets a tag.
您可以使用此在线应用程序自己进行测试: http://grokconstructor.appspot.com/做/匹配#结果
You can test it yourself with this online application: http://grokconstructor.appspot.com/do/match#result
对于这些测试值,regepx捕获所有符合条件的消息,因此带有"test"的两行以绿色突出显示:
For these test values, the regepx captures all messages which meets the criteria, so the two lines with "test" are highlighted in green:
test
2016-09-23 18:26:49,714
2016-09-23 18:26:40,244
test
但是,当输入如下所示时,它将捕获第一个日期:
However it captures the first date when the input is something like this:
2016-09-23 18:26:49,714
2016-09-23 18:26:40,244
test
我想了解这种行为背后的原因是什么,我该如何预防呢?
I would like to understand what is the reason behind this behaviour, and how could I prevent it?
推荐答案
似乎是在某些消息的开头有一个BOM(字节顺序标记),我可以用Grok中的以下正则表达式捕获该材料: >
It appears to be that at the beginning of some messages there was a BOM (byte order mark) which I could capture with the following regexp in Grok:
^(?:\xEF\xBB\xBF).*&
我可以在剪贴板上保留此标记,但是看起来StackOwerflow可以将其剪下来,这就是为什么我的示例不适用于所有人的原因.
I could keep this mark on the clip board, but looks like StackOwerflow cuts it down, that's why my example didn't work for everyone.
这篇关于Grok中的正则表达式有时会获取值,有时却无法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!