Lucene 正则表达式中的单词边界 [英] Word boundary in Lucene regex

查看：28 发布时间：2022/1/15 12:29:08 regex elasticsearch lucene

本文介绍了Lucene 正则表达式中的单词边界的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在 Elastisearch 中使用单词边界进行正则表达式查询，但是它看起来像 Lucene 正则表达式引擎不支持 .我可以使用哪些解决方法?

I'd like to a make a regex query in Elastisearch with word boundaries, however it looks like the Lucene regex engine doesn't support . What workarounds can I use?

推荐答案

在 ElasticSearch regex 风格中，没有直接等价于单词边界.初始类似于 (^|[^A-Za-z0-9_]) 如果 word 以单词 char 开头，如果 word 以单词 char 结尾，则尾随类似于 ($|[^A-Za-z0-9_]).

In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial is something like (^|[^A-Za-z0-9_]) if the word starts with a word char, and the trailing is like ($|[^A-Za-z0-9_]) if the word ends with a word char.

因此，我们需要确保在 word 或字符串的开头/结尾之前和之后有一个非单词字符.由于正则表达式是默认锚定的，我们只需在字符串的开头/结尾添加 [^A-Za-z0-9_] 即可，只需在旁边添加 .* 和用可选的分组结构包装:

Thus, we need to make sure that there is a non-word char before and after word or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_] optional at start/end of string is add .* beside and wrap with an optional grouping construct:

(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?

详情

(.*[^A-Za-z0-9_])? - 字符串开头或任何 0+ 字符(但换行符，否则使用 (.| )*)，然后是除单词 char 之外的任何字符(基本上，它是字符串的开头，后跟组内模式的 1 或 0 次出现)
word - 一个词
([^A-Za-z0-9_].*)? - 任何字符的可选序列，但一个单词 char 后跟任何 0+ 个字符，然后是字符串位置的结尾(隐含在 Lucene 正则表达式中).

(.*[^A-Za-z0-9_])? - either start of string or any 0+ chars (but a line break char, else use (.| )*) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word - a word
([^A-Za-z0-9_].*)? - an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).

这篇关于Lucene 正则表达式中的单词边界的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lucene 正则表达式中的单词边界 [英] Word boundary in Lucene regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucene 正则表达式中的单词边界 [英] Word boundary in Lucene regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭