正则表达式 - 匹配第 n 个字符,然后停止(非贪婪) [英] Regex - Match the nth number of char, then stop (non-greedy)

查看:106
本文介绍了正则表达式 - 匹配第 n 个字符,然后停止(非贪婪)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在此日志事件中捕获时间戳(对于 Splunk)

Trying to capture the timestamp in this log event (for Splunk)

172.21.201.135 | http | o@1I0BTOx1063x3667295x0 | hkv | 2020-06-10 17:43:18,951 | "POST /rest/build-status/latest/commits/stats HTTP/1.1" | "http://bitbucket.my.com/projects/WF/repos/klp-libs/compare/commits" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" | 200 | 345 | 431 | - | 5 | 3dk4qm | 

使用设置 TIME_PREFIX,Splunk 软件使用指定的正则表达式在尝试提取时间戳之前查找匹配项.

Using the setting TIME_PREFIX, Splunk software uses the specified regular expression to looks for a match before attempting to extract a timestamp.

TIME_PREFIX = <regular expression>  

默认行为是 Splunk 尝试从行的开头获取时间戳,但这是一个 IP 地址,因此正则表达式需要匹配四个管道,即 ...time_prefix.

Default behaviour would be for Splunk to try to get the timestamp from the start of the line, but that is an IP-adress, therefore the need for the regex to match four pipes which is the ...time_prefix.

通过使用以下正则表达式

By using the following regex

(?:[^\|]*(\|)){4}

我希望正则表达式在|"的第四次出现时匹配,然后停止,我猜是非贪婪的.

I want the regex to match on the fourth occurence of the '|', and then stop, non-greedy I guess.

推荐答案

有两件事需要考虑:

  • 将模式锚定在字符串的开头,否则,环境可能会在字符串内的每个位置触发正则表达式搜索,您可能会得到比预期更多的匹配

  • Anchor the pattern at the start of the string, else, the environment may trigger a regex search at every position inside the string, and you may get many more matches than you expect

当您不需要创建捕获时,即当您不需要将正则表达式匹配的一部分保存到单独的内存缓冲区时(在 Splunk 中,这相当于创建一个单独的字段),您应该使用 非捕获组 而不是在对一系列模式进行分组时捕获一个.

When you do not need to create captures, i.e. when you needn't save part of the regex match to a separate memory buffer (in Splunk, the is equal to creating a separate field), you should use a non-capturing group rather than a capturing one when grouping a sequence of patterns.

因此,您需要

^(?:[^|]*\|){4}\s*

查看正则表达式演示,显示匹配扩展到日期时间子字符串而不匹配.

See the regex demo showing the match extends to the datetime substring without matching it.

详情

  • ^ - 字符串锚点的开始
  • (?:[^|]*\|){4} - 匹配四次重复的非捕获组 ((?:...))({4}) 除 | ([^|]*) 之外的任何 0 个或多个字符,然后是 | 字符 (\|)
  • \s* - 0 个或多个空格.
  • ^ - start of string anchor
  • (?:[^|]*\|){4} - a non-capturing group ((?:...)) that matches four repetitions ({4}) of any 0 or more chars other than | ([^|]*) and then a | char (\|)
  • \s* - 0 or more whitespaces.

这篇关于正则表达式 - 匹配第 n 个字符,然后停止(非贪婪)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆