使用Lucene 4正则表达式搜索社会保险号 [英] Searching for Social security number using Lucene 4 regexp

查看：125 发布时间：2020/5/4 7:56:36 regex lucene

本文介绍了使用Lucene 4正则表达式搜索社会保险号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Lucene 4 Regexp查询来查找社会保险号.如果使用StandardAnalyzer或EnglishAnalyzer分析了该字段，那么还有其他方法可以匹配222-33-4444或222 33 4444之类的字符串.

I'm trying to use Lucene 4 Regexp query to find social security numbers. If the field is analyzed using the StandardAnalyzer or the EnglishAnalyzer, is there still some way to match strings like 222-33-4444 or 222 33 4444.

据我所知，这些分析器将SSN的组件标记化，然后就无法捕获3个组件的连续匹配.理想情况下，我希望222 33 4444与"/[0-9]{3}/ /[0-9]{2}/ /[0-9]{4}/"之类的东西匹配，但这似乎不是因为短语查询不适用于regexp的(是吗?)有什么建议吗?

As far as I can see, these analyzers tokenize the components of the SSN, and then there's no way to catch consecutive matches for the 3 components. Ideally, I'd like 222 33 4444 to match something like "/[0-9]{3}/ /[0-9]{2}/ /[0-9]{4}/" but it doesn't seem to be perhaps because phrase queries do not work with regexp's (yes?) Any suggestions?

推荐答案

如果您仅具有一个标识符字段或类似的字段，请使用StringField或其他一些未标记的字段，在这种情况下，请使用简单的RegExpQuery定义起来很简单.

If you simply have a field of identifiers, or some such, use a StringField, or some other untokenized field, in which case a simple RegExpQuery is simple enough to define.

如果您尝试将它们从必须进行标记化的全文字段中拉出(我认为是这种情况)，则可以使用

If you are trying to pull them out of a full-text field, which must be tokenized (and I assume this is the case), you can use the SpanQuery API to construct the appropriate query:

SpanQuery span1 = new SpanMultiTermQueryWrapper(new RegexpQuery(new Term("text", "[0-9]{3}")));
SpanQuery span2 = new SpanMultiTermQueryWrapper(new RegexpQuery(new Term("text", "[0-9]{2}")));
SpanQuery span3 = new SpanMultiTermQueryWrapper(new RegexpQuery(new Term("text", "[0-9]{4}")));

Query query = new SpanNearQuery({span1, span2, span3}, 0, true);

searcher.search(query, maxResults)

这篇关于使用Lucene 4正则表达式搜索社会保险号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Lucene 4正则表达式搜索社会保险号 [英] Searching for Social security number using Lucene 4 regexp

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Lucene 4正则表达式搜索社会保险号 [英] Searching for Social security number using Lucene 4 regexp

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭