Lucene查询失败,必须混合/MUST/MUST_NOT [英] Lucene query fails with mixed MUST/MUST_NOT
问题描述
给出带有此文本的文档,该文档在名为Content的字段中建立索引:
Given a document with this text, indexed in a field named Content:
The dish ran away with the spoon.
以下查询与该文档不匹配:
The following query fails to match that document:
+Content:dish +(-Content:xyz) <-- no results!
我希望将查询视为必须包含盘",不能包含"xyz" .失败的是必须"部分.
I want the query to be treated as must include "dish", must not include "xyz". It's the "must not" part that is failing.
我知道+-组合看起来很有趣,但从语法上讲应该是正确的,尤其是考虑到以下变体都可以起作用:
I know the +- combination looks funny but syntactically it should be correct, especially considering that the following variations all work:
+Content:dish +(-Content:xyz +Content:spoon) <-- this works
+Content:dish -Content:xyz <-- this works
那么+(-Content:xyz)
为什么不起作用?这是设计使然,还是错误,还是我只是想念一些东西?我使用的是Lucene.Net,但我假设常规的Lucene的行为相同.
So why doesn't +(-Content:xyz)
work? Is that by design, or a bug, or am I just missing something? I'm using Lucene.Net but I assume regular Lucene behaves the same.
推荐答案
Lucene并非从所有内容的完整视图入手,例如SQL数据库. Lucene首先没有匹配的文档,然后根据搜索到的子句查找内容.这就是为什么:
Lucene doesn't start with a full view of everything, like a SQL database. Lucene starts with no documents matched, and finds things based on the clauses searched on. This is why:
-Content:xyz
单靠它是行不通的.它知道不带content:xyz,但是没有提供任何匹配的文件.您的查询也是如此,因为它位于子查询中.
On it's own doesn't really work. It knows not to bring in content:xyz, but hasn't been given any documents to match. The same is true of your query, because it's placed in a subquery.
-Content:xyz
首先被评估,其本身不会获得任何文档.这样您就可以有效地
-Content:xyz
is evaluated first, which gets no docs on it's own. So then you have, effectively
+Content:dish +(no documents)
将-
视为AND NOT
而不是简单地作为NOT
是很有用的(尽管不要认为+/-和AND/OR/NOT语法必须直接相互映射)
It's useful to think of -
as an AND NOT
rather than simply a NOT
(though don't take that to imply the +/- and AND/OR/NOT syntax necessarily map to each other directly).
如果您希望能够执行这样一个孤独的否定查询,则需要先引入所有文档. MatchAllDocsQuery 是完成此操作的最佳方法,例如:
If you want to be able to execute a lonely negative query like that, you need to bring in all documents first. The MatchAllDocsQuery is the best way to accomplish that, something like:
BooleanQuery query = new BooleanQuery();
query.add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD));
query.add(new BooleanClause(new TermQuery(new Term("Content","xyz")), BooleanClause.Occur.MUST_NOT));
将等同于SQL样式查询,而只对WHERE子句取反.
Would be the equivalent of a SQL style query with only a negation for a WHERE clause.
当然,从以下情况开始,这实际上不是必需的:
Of course, this isn't really necessary in the case you've listed since:
+Content:dish -Content:xyz
完全足够.
这篇关于Lucene查询失败,必须混合/MUST/MUST_NOT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!