如何在Lucene中编写正则表达式模式? [英] How to write regex pattern in lucene?

查看:116
本文介绍了如何在Lucene中编写正则表达式模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Lucene中匹配来自正则表达式查询的字符串.

I want to match a string from regexp query in lucene.

测试字符串:

       program-id.  acinstal.

Java中的正则表达式模式:

Regex pattern in java:

^[a-z0-9 ]{6}[^*]\s*(program-id)\.

我将如何专门为lucene regexp查询编写此正则表达式以匹配字符串.

How would i write this regex specifically for lucene regexp query to match the string.

推荐答案

正则表达式的两个问题(假设基于前面的问题,这里的测试字符串被索引而没有任何标记化.例如,作为StringField) :

Two problems with your regex (assuming here, based on previous questions, that your test string is indexed without any tokenization. As a StringField, for instance):

  1. 正则表达式必须匹配整个术语.正如我们假设的那样,如果不进行任何分析,则意味着它必须与整个字段匹配.在这种情况下,您需要添加.*来匹配其余字段

  1. The regex must match a whole term. Without any analysis, as we're assuming, that means it must match the whole field. In this case, you need to add a .* to match the rest of the field

由于您仍然必须匹配整个字段,因此不支持锚,因此请一开始就删除^.

Since you have to match the whole field anyway, anchors are not supported, so get rid of the ^ at the beginning.

所以应该起作用的正则表达式是:

So the regex that should work is:

[a-z0-9 ]{6}[^*]\s*(program-id)\..*

这篇关于如何在Lucene中编写正则表达式模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆