MultiFieldQueryParser 正在从首字母缩写词中删除点 [英] MultiFieldQueryParser is removing dots from the acronym

查看:16
本文介绍了MultiFieldQueryParser 正在从首字母缩写词中删除点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于我的问题没有得到回答,所以我再次发布这个问题.

Am posting this question again as my query is not answered.

我正在使用 Lucene 开发图书搜索 API.用户可以搜索标题或描述字段包含 C.F.A 的书...我正在使用 StandardAnalyzer 以及停用词列表.

Am working on a book search api using Lucene. User can search for a book whose title or description field contains C.F.A... Am using StandardAnalyzer alongwith a list of stop words.

我使用 MultiFieldQueryParser 来解析上面的字符串.但是解析后,它会删除字符串中的点.我在这里错过了什么?

Am using MultiFieldQueryParser for parsing above string.But after parsing, its removing the dots in the string. What am i missing here?

谢谢.

推荐答案

正如你所说,这是 this question.我建议你至少在你的问题中添加一个链接.另外,我会敦促您创建一个用户帐户,因为现在无法查看您的旧问题以获取上下文.

As you mentioned, this is a dupe of this question. I suggest you at least add a link to it in your question. Also, I would urge you to create a user account, since right now it's not possible to look at your old question to get context.

StandardAnalyzer 专门处理首字母缩略词,并将 C.F.A.(例如)到 cfa.这意味着您应该能够进行搜索,只要您确保使用相同的分析器进行索引和查询解析.

The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.

我建议您运行一些更基本的测试用例以消除其他因素.尝试使用普通的 QueryParser 而不是多字段的.

I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.

这是我编写的一些代码,用于使用 StandardAnalyzer:

Here's some code I wrote to play with the StandardAnalyzer:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

顺便说一句,这个输出是:

The output for this, by the way was:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

请注意,例如,如果首字母缩略词不以点结尾,则分析器会假定它是 Internet 主机名,因此搜索C.F.A"将不会匹配C.F.A".在正文中.

Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.

这篇关于MultiFieldQueryParser 正在从首字母缩写词中删除点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆