TermQuery未返回已知搜索词,但WildcardQuery确实会返回 [英] TermQuery not returning on a known search term, but WildcardQuery does

查看:484
本文介绍了TermQuery未返回已知搜索词,但WildcardQuery确实会返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人能对Lucene的内部运作有足够的了解,也许可以为我指明正确的方向=)

Am hoping someone with enough insight into the inner workings of Lucene might be able to point me in the right direction =)

我将跳过大多数周围的代码,并直接进行跟踪.我有一个Lucene索引,向该索引中添加了以下字段(变量被其文字值替换):

I'll skip most of the surrounding irellevant code, and cut right to the chase. I have a Lucene index, to which I am adding the following field to the index (variables replaced by their literal values):

document.Add( new Field("Typenummer", "E5CEB501A244410EB1FFC4761F79E7B7", 
                        Field.Store.YES , Field.Index.UN_TOKENIZED));

稍后,当我搜索索引(使用其他类型的查询)时,我可以验证此字段确实出现在索引中-就像在遍历Document.GetFields()返回的所有字段时一样

Later, when I search my index (using other types of queries), I am able to verify that this field does indeed appear in my index - like when looping through all Fields returned by Document.GetFields()

Field: Typenummer, Value: E5CEB501A244410EB1FFC4761F79E7B7

到目前为止很好:-)

现在真正的问题是-为什么我不能使用TermQuery来搜索该值并实际获得结果.

Now the real problem is - why can I not use a TermQuery to search against this value and actually get a result.

此代码产生0次匹配:

// Returns 0 hits
bq.Add( new TermQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

但是,如果我将其切换为WildcardQuery(不使用通配符),则会获得我期望的1分.

But if I switch this to a WildcardQuery (with no wildcards), I get the 1 hit I expect.

// returns the 1 hit I expect
bq.Add( new WildcardQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

我检查了字段长度,检查了我使用的是相同的分析器,依此类推,但为什么仍然如此,我仍然在平方1上.

I've checked field lengths, I've checked that I am using the same Analyzer and so on and I am still on square 1 as to why this is.

有人能指出我应该寻找的方向吗?

Can anyone point me in a direction I should be looking?

推荐答案

我终于知道发生了什么事.我正在扩展此问题的标签,这令我非常惊讶,实际上,这实际上是与该特定问题所在的CMS有关的问题.总而言之,问题可归结为:

I finally figured out what was going on. I'm expanding the tags for this question as it, much to my surprise, actually turned out to be an issue with the CMS this particular problem exists in. In summary, the problem came down to this:

  1. 该字段存储在UN_TOKENIZED中,这意味着Lucene会精确地按原样存储它
  2. 我粘贴的代码片段中的BooleanQuery被发送到PreparedQuery包装器中的Sitecore SearchManager中
  3. 我从中期望的行为是,我的查询(已经准备好了)将不加更改地转到Lucene API
  4. 原来我错了.它通过RewriteQuery方法,该方法按原样复制我的整个嵌套查询集,但有一个例外-所有Term参数都通过LowercaseStrategy()
  5. 传递
  6. 当我为大写术语(UN_TOKENIZED)编制索引时,Sitecore将我的PreparedQuery更改为小写-返回了0个结果

我不会开始争论这是Lucene Wrapper API的按设计"还是按设计缺陷"实现-我只需要注意的是,在使用PreparedQuery重载时重写查询是...我...意外;-)

Am not going to start an argument of whether this is "by design" or "by design flaw" implementation of the Lucene Wrapper API - I'll just note that rewriting my query when using the PreparedQuery overload is... to me... unexpected ;-)

进一步的教导;将字段存储为TOKENIZED也将解决此问题,因为默认情况下StandardAnalyzer将小写所有标记.

Further teachings from this; storing the field as TOKENIZED will eliminate this problem too, as the StandardAnalyzer by default will lowercase all tokens.

这篇关于TermQuery未返回已知搜索词,但WildcardQuery确实会返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆