防止“条款过多"在lucene查询 [英] Prevent "Too Many Clauses" on lucene query

查看:52
本文介绍了防止“条款过多"在lucene查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的测试中,当我尝试从包含 termquery 和通配符查询的布尔查询中获取命中时,我突然遇到了 Too Many Clauses 异常.

In my tests I suddenly bumped into a Too Many Clauses exception when trying to get the hits from a boolean query that consisted of a termquery and a wildcard query.

我在网上搜索了发现的资源,他们建议增加BooleanQuery.SetMaxClauseCount().
对我来说这听起来像是可疑的.我应该怎么做?我怎么能相信这个新的魔术数字足以满足我的查询要求?在所有地狱崩溃之前,我可以增加这个数字多远?

I searched around the net and on the found resources they suggest to increase the BooleanQuery.SetMaxClauseCount().
This sounds fishy to me.. To what should I up it? How can I rely that this new magic number will be sufficient for my query? How far can I increment this number before all hell breaks loose?

总的来说,我认为这不是解决方案.必须有一个更深层次的问题.

In general I feel this is not a solution. There must be a deeper problem..

查询为+ {+ companyName:mercedes + paintCode:a *},索引包含约250万个文档.

The query was +{+companyName:mercedes +paintCode:a*} and the index has ~2.5M documents.

推荐答案

查询的paintCode:a *部分是对所有以"a"开头的paintCode的前缀查询.那是你的目标吗?

the paintCode:a* part of the query is a prefix query for any paintCode beginning with an "a". Is that what you're aiming for?

Lucene将前缀查询扩展为一个布尔查询,其中包含与该前缀匹配的所有可能术语.在您的情况下,显然有超过1024种可能的 paintCode 以"a"开头.

Lucene expands prefix queries into a boolean query containing all the possible terms that match the prefix. In your case, apparently there are more than 1024 possible paintCodes that begin with an "a".

如果您觉得前缀查询没用,那您离真相不远了.

If it sounds to you like prefix queries are useless, you're not far from the truth.

我建议您更改索引方案以避免使用前缀查询.我不确定您要通过示例完成什么工作,但是如果您想按首字母搜索油漆代码,请创建一个paintCodeFirstLetter字段并按该字段进行搜索.

I would suggest you change your indexing scheme to avoid using a Prefix Query. I'm not sure what you're trying to accomplish with your example, but if you want to search for paint codes by first letter, make a paintCodeFirstLetter field and search by that field.

如果您绝望并愿意接受部分结果,则可以从源代码构建自己的Lucene版本.您需要在 org/apache/lucene/search 下对文件 PrefixQuery.java MultiTermQuery.java 进行更改.在两个类的 rewrite 方法中,更改行

If you're desperate, and are willing to accept partial results, you can build your own Lucene version from source. You need to make changes to the files PrefixQuery.java and MultiTermQuery.java, both under org/apache/lucene/search. In the rewrite method of both classes, change the line

query.add(tq, BooleanClause.Occur.SHOULD);          // add to query

try {
    query.add(tq, BooleanClause.Occur.SHOULD);          // add to query
} catch (TooManyClauses e) {
    break;
}

我是为我自己的项目完成的,并且有效.

I did this for my own project and it works.

如果您真的不喜欢更改Lucene的想法,则可以编写自己的PrefixQuery变体和您自己的QueryParser,但我认为效果不会更好.

If you really don't like the idea of changing Lucene, you could write your own PrefixQuery variant and your own QueryParser, but I don't think it's much better.

这篇关于防止“条款过多"在lucene查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆