在休眠列表字段中搜索Lucene [英] Lucene search on a Hibernate List field
问题描述
我有一个Hibernate注释类 TestClass
,它包含一个 List< String>
字段,我使用Lucene 。考虑下面的例子:
Foo Bar
和Bar Snafu code>是特定记录列表中的两个条目。现在,如果用户在TestClass上搜索
Foo Snafu
,那么记录将被找到,我猜测是因为标记Foo和标记Snafu都是 List< String>
这个记录。
有没有办法阻止这种情况发生?
现实世界的例子是一个有原告和被告名单的法院案例。假设有两个人被起诉, Joe Lewis Bob
和 Robert Clay Smith
。这些用户被存储在被告名单的法院案件记录中。这个被告名单与Lucene索引。现在,如果用户搜索前面提到的两名被告中的任何一名,就会找到案件。但是,如果用户搜索 Lewis Smith
或 Joe Clay
,那么也会找到该案例。
更新:在Lucene IRC频道中提到我可以使用多值字段。
更新2:在Solr IRC频道中提到,我可以在
模式中使用 positionIncrementGap
设置。 xml
用Solr来完成。显然,如果我使用短语查询(有或没有slop),那么增量差距确保了同一字段中的不同值不会导致意外匹配。 Lucene将连续添加到同一文档中的相同字段并添加到字段中已有的字段中。
如果你想把列表中的每个成员看作一个完全独立的实体,你应该将它们编入不同的领域。您只需将索引附加到您已使用的字段名称即可。虽然我没有完整的需求信息,当然,做这样的事情可能是更好的解决方案。
如果您只想搜索确切的文本Foo Snafu
,您可以使用 PhraseQuery 。如果你想确保你的短语查询不会从一个列表项到另一个列表项(即,如果你有Bar Foo
和 Snafu Bar
在索引中),您可以在写入索引时在每个成员之间插入某种形式的分隔术语。
I have a Hibernate annotated class TestClass
that contains a List<String>
field that I am indexing with Lucene. Consider the following example:
"Foo Bar"
and "Bar Snafu"
are two entries in the List for a particular record. Now, If a user searches on TestClass for "Foo Snafu"
then the record will be found, I am guessing because the token Foo and the token Snafu are both tokens in the List<String>
for this record.
Is there a way I can prevent this from happening?
The real world example is a Court case that has a List of Plaintiffs and Defendants. Say there are two people being prosecuted on the case, Joe Lewis Bob
and Robert Clay Smith
. These users are stored in the Court case record in a List of Defendants. This List of defendants is indexed with Lucene. Now if a user searches for either of the two defendants mentioned earlier, the case will be found. But the case will also be found if a user searches for Lewis Smith
, or Joe Clay
.
Update: It was mentioned in the Lucene IRC channel that I could possibly use a multi-valued field.
Update 2: It was mentioned in the Solr IRC channel that I could use the positionIncrementGap
setting in schema.xml
to accomplish this with Solr. Apparently if I use a phrase query (with or without slop) then "the increment gap ensures that different values in the same field won't cause an unintended match".
Lucene appends successive additions to the same field in the same document to the end of what it already has in the field.
If you want to treat each member of the List as an entirely separate entity, you should index them in different fields. you could just append the index to the field name you are already using. While I don't have complete information on your needs, of course, doing something like this is probably the better solution.
If you just want to search for the precise text "Foo Snafu"
, you can use a PhraseQuery. If you want to be sure your phrasequery doesn't cross from one list item to the next (ie, if you had "Bar Foo"
and "Snafu Bar"
in the index), you could insert some form of delimiting term between each member when writing to the index.
这篇关于在休眠列表字段中搜索Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!