如何在Lucene中编写正则表达式模式? [英] How to write regex pattern in lucene?
问题描述
我想在Lucene中匹配来自正则表达式查询的字符串.
I want to match a string from regexp query in lucene.
测试字符串:
program-id. acinstal.
Java中的正则表达式模式:
Regex pattern in java:
^[a-z0-9 ]{6}[^*]\s*(program-id)\.
我将如何专门为lucene regexp查询编写此正则表达式以匹配字符串.
How would i write this regex specifically for lucene regexp query to match the string.
推荐答案
正则表达式的两个问题(假设基于前面的问题,这里的测试字符串被索引而没有任何标记化.例如,作为StringField
) :
Two problems with your regex (assuming here, based on previous questions, that your test string is indexed without any tokenization. As a StringField
, for instance):
-
正则表达式必须匹配整个术语.正如我们假设的那样,如果不进行任何分析,则意味着它必须与整个字段匹配.在这种情况下,您需要添加
.*
来匹配其余字段
The regex must match a whole term. Without any analysis, as we're assuming, that means it must match the whole field. In this case, you need to add a
.*
to match the rest of the field
由于您仍然必须匹配整个字段,因此不支持锚,因此请一开始就删除^
.
Since you have to match the whole field anyway, anchors are not supported, so get rid of the ^
at the beginning.
所以应该起作用的正则表达式是:
So the regex that should work is:
[a-z0-9 ]{6}[^*]\s*(program-id)\..*
这篇关于如何在Lucene中编写正则表达式模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!