MultiFieldQueryParser正在从首字母缩写词中删除点 [英] MultiFieldQueryParser is removing dots from the acronym

查看:74
本文介绍了MultiFieldQueryParser正在从首字母缩写词中删除点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于我的查询未得到答复,因此再次发布此问题.

我正在使用Lucene开发图书搜索API. 用户可以搜索标题或描述字段包含C.F.A ...的书籍 我正在使用StandardAnalyzer以及停用词列表.

Am使用MultiFieldQueryParser解析上面的字符串,但是在解析之后,它删除了字符串中的点.我在这里想念什么?

谢谢.

解决方案

正如您提到的,这是此问题的重复形式.我建议您至少在您的问题中添加指向它的链接.另外,我建议您创建一个用户帐户,因为目前无法查看您的旧问题来获取上下文.

StandardAnalyzer专门处理首字母缩写词,并转换C.F.A. (例如)到cfa.这意味着您应该能够进行搜索,只要确保为索引和查询解析使用相同的分析器即可.

我建议您运行一些更基本的测试用例以消除其他因素.尝试使用普通的QueryParser而不是多字段的.

这是我编写的用于与StandardAnalyzer一起玩的一些代码:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

顺便说一句,这的输出是:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

例如,请注意,如果首字母缩写词不以点号结尾,则分析器会假定它是互联网主机名,因此搜索"C.F.A"将与"C.F.A."不匹配.在文本中.

Am posting this question again as my query is not answered.

Am working on a book search api using Lucene. User can search for a book whose title or description field contains C.F.A... Am using StandardAnalyzer alongwith a list of stop words.

Am using MultiFieldQueryParser for parsing above string.But after parsing, its removing the dots in the string. What am i missing here?

Thanks.

解决方案

As you mentioned, this is a dupe of this question. I suggest you at least add a link to it in your question. Also, I would urge you to create a user account, since right now it's not possible to look at your old question to get context.

The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.

I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.

Here's some code I wrote to play with the StandardAnalyzer:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

The output for this, by the way was:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.

这篇关于MultiFieldQueryParser正在从首字母缩写词中删除点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆