Lucene通配符查询 [英] Lucene wildcard queries

查看:211
本文介绍了Lucene通配符查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于Lucene的问题.

I have this question relating to Lucene.

我有一个表单,并且从中获取文本,我想在几个字段中执行全文搜索.假设我从输入中得到了文本"textToLook".

I have a form and I get a text from it and I want to perform a full text search in several fields. Suppose I get from the input the text "textToLook".

我有一个带有多个过滤器的Lucene分析器.其中之一是lowerCaseFilter,因此当我创建索引时,单词将被小写.

I have a Lucene Analyzer with several filters. One of them is lowerCaseFilter, so when I create the index, words will be lowercased.

想象一下,我想搜索两个字段field1和field2,这样lucene查询将是这样的(请注意,"textToLook"现在是"texttolook"):

Imagine I want to search into two fields field1 and field2 so the lucene query would be something like this (note that 'textToLook' now is 'texttolook'):

field1: texttolook* field2:texttolook*

在我的课堂上,我有类似的内容来创建查询.没有通配符时,我会工作.

In my class I have something like this to create the query. I works when there is no wildcard.

String text = "textToLook";
String[] fields = {"field1", "field2"};
//analyser is the same as the one used for indexing
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("customAnalyzer");
MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
org.apache.lucene.search.Query queryTextoLibre = parser.parse(text);

使用此代码,查询将是:

With this code the query would be:

field1: texttolook field2:texttolook

但是如果我将文本设置为"textToLook *",我会得到

but If I set text to "textToLook*" I get

field1: textToLook* field2:textToLook*

由于索引为小写字母,因此无法正确找到.

which won't find correctly as the indexes are in lowercase.

我已经在lucene网站上阅读过:

I have read in lucene website this:

通配符,前缀和模糊查询 没有通过分析仪 这是执行的组件 诸如词干和 降低外壳"

" Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing"

无法通过将行为设置为不区分大小写来解决我的问题,因为我的分析仪具有其他字段,例如,这些字段删除了一些后缀.

My problem cannot be solved by setting the behaviour case insensitive cause my analyzer has other fields which for examples remove some suffixes of words.

我认为我可以通过在分析器的过滤器中获取文本后解决文本的方式来解决问题,然后可以添加"*",然后可以使用MultiFieldQueryParser构建查询.因此,在此示例中,我将获得"textToLower",并且在传递给这些过滤器之后,我将获得"texttolower".之后,我可以制作"textotolower *".

I think I can solve the problem by getting how the text would be after going through the filters of my analyzer, then I could add the "*" and then I could build the Query with MultiFieldQueryParser. So in this example I woud get "textToLower" and after being passed to to these filters I could get "texttolower". After this I could make "textotolower*".

但是,经过所有分析器的过滤器后,是否有任何方法可以获取文本变量的值?如何获得分析仪的所有过滤器?这可能吗?

But, is there any way to get the value of my text variable after going through all my analyzer's filters? How can I get all the filters of my analyzer? Is this possible?

谢谢

推荐答案

能否使用QueryParser.setLowercaseExpandedTerms(true)?

Can you use QueryParser.setLowercaseExpandedTerms(true)?

http://wiki.apache.org/lucene -java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F

**编辑**

好的,我现在了解您的问题.您实际上希望在通配符查询中运行通配符之前就将其阻止.

Okay, I understand your issue now. You actually want the wildcarded term to be stemmed before it's run through the wildcard query.

您可以将QueryParser子类化并覆盖

You can subclass QueryParser and override

protected Query getWildcardQuery(String field, String termStr) throws ParseException

在构造WildcardQuery之前通过分析器运行termStr.

to run termStr through the analyzer before the WildcardQuery is constructed.

但是,这可能不是用户期望的.根据常见问题,他们决定不通过分析器运行通配符是有原因的:

This might not be what the user expects, though. There's a reason why they've decided not to run wildcarded terms through the analyzer, per the faq:

跳过分析器的原因 是,如果您正在寻找 狗*",您不希望使用狗" 首先是狗",因为 然后将匹配"dog *",而不是 预期的查询.

The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query.

这篇关于Lucene通配符查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆