Lucene 查询因混合 MUST/MUST_NOT 而失败 [英] Lucene query fails with mixed MUST/MUST_NOT

查看:43
本文介绍了Lucene 查询因混合 MUST/MUST_NOT 而失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个包含此文本的文档,索引在名为 Content 的字段中:

Given a document with this text, indexed in a field named Content:

The dish ran away with the spoon.

以下查询无法匹配该文档:

The following query fails to match that document:

+Content:dish +(-Content:xyz)   <-- no results!

我希望将查询视为必须包含dish",不得包含xyz".失败的是绝不能"的部分.

I want the query to be treated as must include "dish", must not include "xyz". It's the "must not" part that is failing.

我知道 +- 组合看起来很有趣,但在语法上它应该是正确的,特别是考虑到以下变体都有效:

I know the +- combination looks funny but syntactically it should be correct, especially considering that the following variations all work:

+Content:dish +(-Content:xyz +Content:spoon)   <-- this works
+Content:dish -Content:xyz                     <-- this works

那么为什么 +(-Content:xyz) 不起作用?这是设计使然,还是错误,或者我只是错过了什么?我正在使用 Lucene.Net,但我认为常规 Lucene 的行为相同.

So why doesn't +(-Content:xyz) work? Is that by design, or a bug, or am I just missing something? I'm using Lucene.Net but I assume regular Lucene behaves the same.

推荐答案

Lucene 并没有像 SQL 数据库那样从一个完整的视图开始.Lucene 从没有匹配的文档开始,并根据搜索的子句查找内容.这就是为什么:

Lucene doesn't start with a full view of everything, like a SQL database. Lucene starts with no documents matched, and finds things based on the clauses searched on. This is why:

-Content:xyz

单靠它自己是行不通的.它知道不带入 content:xyz,但没有得到任何匹配的文件.您的查询也是如此,因为它被放置在子查询中.

On it's own doesn't really work. It knows not to bring in content:xyz, but hasn't been given any documents to match. The same is true of your query, because it's placed in a subquery.

-Content:xyz 首先被评估,它自己没有文档.那么你有,有效的

-Content:xyz is evaluated first, which gets no docs on it's own. So then you have, effectively

+Content:dish +(no documents)

- 视为 AND NOT 而不是简单的 NOT 很有用(尽管不要认为这意味着 +/- 和 AND/OR/NOT 语法必须直接相互映射).

It's useful to think of - as an AND NOT rather than simply a NOT (though don't take that to imply the +/- and AND/OR/NOT syntax necessarily map to each other directly).

如果你希望能够执行这样一个孤独的否定查询,你需要先引入所有文档.MatchAllDocsQuery 是实现这一目标的最佳方式,例如:

If you want to be able to execute a lonely negative query like that, you need to bring in all documents first. The MatchAllDocsQuery is the best way to accomplish that, something like:

BooleanQuery query = new BooleanQuery();
query.add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD));
query.add(new BooleanClause(new TermQuery(new Term("Content","xyz")), BooleanClause.Occur.MUST_NOT));

相当于只对 WHERE 子句进行否定的 SQL 样式查询.

Would be the equivalent of a SQL style query with only a negation for a WHERE clause.

当然,在您列出的情况下,这并不是真正必要的:

Of course, this isn't really necessary in the case you've listed since:

+Content:dish -Content:xyz

完全够用了.

这篇关于Lucene 查询因混合 MUST/MUST_NOT 而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆