如何使用QueryParser执行包含特殊字符的lucene查询? [英] How to perform a lucene query containing special character using QueryParser?

查看:517
本文介绍了如何使用QueryParser执行包含特殊字符的lucene查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是事情。我有一个存储在索引中的术语,其中包含特殊字符,例如' - ',最简单的代码是这样的:

Here is the thing. I have a term stored in the index, which contains special character, such as '-', the simplest code is like this:

Document doc = new Document();
doc.add(new TextField("message", "1111-2222-3333", Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);

然后我使用QueryParser创建一个查询,如下所示:

And then I create a query using QueryParser, like this:

String queryStr = "1111-2222-3333";
QueryParser parser = new QueryParser(Version.LUCENE_36, "message", new StandardAnalyzer(Version.LUCENE_36));
Query q = parser.parse(queryStr);

然后我使用搜索器搜索查询并得不到任何结果。我也试过这个:

And then I use a searcher to search the query and get no result. I have also tried this:

Query q = parser.parse(QueryParser.escape(queryStr));

仍然没有结果。

不使用QueryParser而是直接使用TermQuery可以做我想要的,但这种方式对用户输入文本不够灵活。

Without using QueryParser and instead using TermQuery directly can do what I want, but this way is not flexible enough for user input texts.

我想也许StandardAnalyzer可以省略查询字符串中的特殊字符。我尝试调试,我发现字符串被拆分,实际查询是这样的:消息:1111消息:2222消息:3333。我不知道lucene到底做了什么...

I think maybe the StandardAnalyzer did something to omit the special character in the query string. I tried debug, and I found that the string is splited and the actual query is like this:"message:1111 message:2222 message:3333". I don't know what exactly lucene has done...

所以如果我想用特殊字符执行查询,我该怎么办?我应该重写分析器还是从默认值中继承一个queryparser?怎么样?...

So if I want to perform the query with special character, what should I do? Should I rewrite an analyzer or inherit a queryparser from the default one? And how to?...

更新:

1 @The New白痴@femtoRgon,我已经尝试了问题中所述的QueryParser.escape(queryStr),但它仍然不起作用。

1 @The New Idiot @femtoRgon, I've tried QueryParser.escape(queryStr) as stated in the problem but it still doesn't work.

2我试过另一种方法解决这个问题。我从Tokenizer派生了一个QueryTokenizer,只用空格切换单词,将它打包成QueryAnalyzer,它派生自Analyzer,最后将QueryAnalyzer传递给QueryParser。

2 I've tried another way to solve the problem. I derived a QueryTokenizer from Tokenizer and cut the word only by space, pack it into a QueryAnalyzer, which derives from Analyzer, and finally pass the QueryAnalyzer into QueryParser.

现在可行。最初它不起作用,因为默认的StandardAnalyzer根据默认规则(将某些特殊字符识别为拆分器)剪切queryStr,当查询传递到QueryParser时,StandardAnalyzer已经删除了特殊字符。现在我使用自己的方式剪切queryStr,它只将空格识别为拆分器,因此特殊字符保留在查询中等待处理,这是有效的。

Now it works. Originally it doesn't work because the default StandardAnalyzer cut the queryStr according to default rules(which recognize some of the special characters as splitters), when the query is passed into QueryParser, the special characters are already deleted by StandardAnalyzer. Now I use my own way to cut the queryStr and it only recognize space as splitter, so the special characters remain into the query waiting for processing and this works.

3 @新的白痴@femtoRgon,谢谢你回答我的问题。

3 @The New Idiot @femtoRgon, thank you for answering my question.

推荐答案

我不确定这一点,但我想你需要使用 \ 逃避 - 。根据 Lucene docs

I am not sure about this , but I guess you need to escape - with \ . As per the Lucene docs.


- 或禁止运算符排除包含 - 符号后面的术语的文档。

The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.

再次,


Lucene支持转义特殊字符是查询语法的一部分。当前列表中的特殊字符是

Lucene supports escaping special characters that are part of the query syntax. The current list special characters are

+ - && || ! (){} [] ^〜*?:\ /

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

要转义这些字符,请在字符前使用\。

To escape these character use the \ before the character.

还要记住,如果它们在Java中有特殊含义,你需要转义两次。

Also remember, some characters you'll need to escape twice if they have special meaning in Java.

这篇关于如何使用QueryParser执行包含特殊字符的lucene查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆