在休眠列表字段中搜索Lucene [英] Lucene search on a Hibernate List field

查看:82
本文介绍了在休眠列表字段中搜索Lucene的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Hibernate注释类 TestClass ,它包含一个 List< String> 字段,我使用Lucene 。考虑下面的例子:

Foo BarBar Snafu code>是特定记录列表中的两个条目。现在,如果用户在TestClass上搜索Foo Snafu,那么记录将被找到,我猜测是因为标记Foo和标记Snafu都是 List< String> 这个记录。



有没有办法阻止这种情况发生?

现实世界的例子是一个有原告和被告名单的法院案例。假设有两个人被起诉, Joe Lewis Bob Robert Clay Smith 。这些用户被存储在被告名单的法院案件记录中。这个被告名单与Lucene索引。现在,如果用户搜索前面提到的两名被告中的任何一名,就会找到案件。但是,如果用户搜索 Lewis Smith Joe Clay ,那么也会找到该案例。



更新:在Lucene IRC频道中提到我可以使用多值字段。



更新2:在Solr IRC频道中提到,我可以在模式中使用 positionIncrementGap 设置。 xml 用Solr来完成。显然,如果我使用短语查询(有或没有slop),那么增量差距确保了同一字段中的不同值不会导致意外匹配。

解决方案

Lucene将连续添加到同一文档中的相同字段并添加到字段中已有的字段中。



如果你想把列表中的每个成员看作一个完全独立的实体,你应该将它们编入不同的领域。您只需将索引附加到您已使用的字段名称即可。虽然我没有完整的需求信息,当然,做这样的事情可能是更好的解决方案。



如果您只想搜索确切的文本Foo Snafu,您可以使用 PhraseQuery 。如果你想确保你的短语查询不会从一个列表项到另一个列表项(即,如果你有Bar Foo Snafu Bar在索引中),您可以在写入索引时在每个成员之间插入某种形式的分隔术语。


I have a Hibernate annotated class TestClass that contains a List<String> field that I am indexing with Lucene. Consider the following example:

"Foo Bar" and "Bar Snafu" are two entries in the List for a particular record. Now, If a user searches on TestClass for "Foo Snafu" then the record will be found, I am guessing because the token Foo and the token Snafu are both tokens in the List<String> for this record.

Is there a way I can prevent this from happening?

The real world example is a Court case that has a List of Plaintiffs and Defendants. Say there are two people being prosecuted on the case, Joe Lewis Bob and Robert Clay Smith. These users are stored in the Court case record in a List of Defendants. This List of defendants is indexed with Lucene. Now if a user searches for either of the two defendants mentioned earlier, the case will be found. But the case will also be found if a user searches for Lewis Smith, or Joe Clay.

Update: It was mentioned in the Lucene IRC channel that I could possibly use a multi-valued field.

Update 2: It was mentioned in the Solr IRC channel that I could use the positionIncrementGap setting in schema.xml to accomplish this with Solr. Apparently if I use a phrase query (with or without slop) then "the increment gap ensures that different values in the same field won't cause an unintended match".

解决方案

Lucene appends successive additions to the same field in the same document to the end of what it already has in the field.

If you want to treat each member of the List as an entirely separate entity, you should index them in different fields. you could just append the index to the field name you are already using. While I don't have complete information on your needs, of course, doing something like this is probably the better solution.

If you just want to search for the precise text "Foo Snafu", you can use a PhraseQuery. If you want to be sure your phrasequery doesn't cross from one list item to the next (ie, if you had "Bar Foo" and "Snafu Bar" in the index), you could insert some form of delimiting term between each member when writing to the index.

这篇关于在休眠列表字段中搜索Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆