TermQuery 未返回已知搜索词,但 WildcardQuery 确实 [英] TermQuery not returning on a known search term, but WildcardQuery does

查看:10
本文介绍了TermQuery 未返回已知搜索词,但 WildcardQuery 确实的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望对 Lucene 内部工作有足够洞察力的人能够为我指明正确的方向 =)

Am hoping someone with enough insight into the inner workings of Lucene might be able to point me in the right direction =)

我将跳过大部分不相关的代码,直接进入正题.我有一个 Lucene 索引,我将以下字段添加到索引中(变量被它们的字面值替换):

I'll skip most of the surrounding irellevant code, and cut right to the chase. I have a Lucene index, to which I am adding the following field to the index (variables replaced by their literal values):

document.Add( new Field("Typenummer", "E5CEB501A244410EB1FFC4761F79E7B7", 
                        Field.Store.YES , Field.Index.UN_TOKENIZED));

稍后,当我搜索我的索引(使用其他类型的查询)时,我能够验证该字段确实出现在我的索引中 - 就像循环遍历 Document.GetFields() 返回的所有字段时一样

Later, when I search my index (using other types of queries), I am able to verify that this field does indeed appear in my index - like when looping through all Fields returned by Document.GetFields()

Field: Typenummer, Value: E5CEB501A244410EB1FFC4761F79E7B7

到目前为止一切顺利:-)

So far so good :-)

现在真正的问题是 - 为什么我不能使用 TermQuery 来搜索这个值并实际得到结果.

Now the real problem is - why can I not use a TermQuery to search against this value and actually get a result.

此代码产生 0 次点击:

This code produces 0 hits:

// Returns 0 hits
bq.Add( new TermQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

但如果我将其切换为 WildcardQuery(没有通配符),我会得到预期的 1 次命中.

But if I switch this to a WildcardQuery (with no wildcards), I get the 1 hit I expect.

// returns the 1 hit I expect
bq.Add( new WildcardQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

我已经检查了字段长度,我已经检查了我使用的是同一个分析器等等,我仍然在第 1 格,为什么会这样.

I've checked field lengths, I've checked that I am using the same Analyzer and so on and I am still on square 1 as to why this is.

谁能指出我应该寻找的方向?

Can anyone point me in a direction I should be looking?

推荐答案

我终于弄清楚是怎么回事了.我正在扩展这个问题的标签,因为令我惊讶的是,实际上这个问题存在于 CMS 中.总而言之,问题归结为:

I finally figured out what was going on. I'm expanding the tags for this question as it, much to my surprise, actually turned out to be an issue with the CMS this particular problem exists in. In summary, the problem came down to this:

  1. 该字段存储为 UN_TOKENIZED,这意味着 Lucene 将完全按原样"存储它
  2. 我从中粘贴片段的 BooleanQuery 被发送到 PreparedQuery 包装器内的 Sitecore SearchManager
  3. 我对此的预期是,我的查询(已经准备好)会原封不动地转到 Lucene API
  4. 原来我错了.它通过一个 RewriteQuery 方法,该方法按原样复制我的整个嵌套查询集,但有一个例外 - 所有 Term 参数都通过 LowercaseStrategy() 传递
  5. 当我为大写术语 (UN_TOKENIZED) 编制索引时,Sitecore 将我的 PreparedQuery 更改为小写 - 返回 0 个结果

我不会开始争论这是 Lucene Wrapper API 的设计缺陷"还是设计缺陷"实现——我只想指出,在使用 PreparedQuery 重载时重写我的查询是......我......出乎意料;-)

Am not going to start an argument of whether this is "by design" or "by design flaw" implementation of the Lucene Wrapper API - I'll just note that rewriting my query when using the PreparedQuery overload is... to me... unexpected ;-)

由此而来的进一步教导;将字段存储为 TOKENIZED 也将消除此问题,因为 StandardAnalyzer 默认情况下会将所有标记小写.

Further teachings from this; storing the field as TOKENIZED will eliminate this problem too, as the StandardAnalyzer by default will lowercase all tokens.

这篇关于TermQuery 未返回已知搜索词,但 WildcardQuery 确实的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆