Solr case不敏感 [英] Solr case insensitve

查看:180
本文介绍了Solr case不敏感的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hallo,



我在Solr中实现自动完成功能,但有一个问题。



autocompletion我使用

 < fieldType name =text_autoclass =solr.TextFieldsortMissingLast =trueomitNorms = true> 
< analyzer>
< tokenizer class =solr.KeywordTokenizerFactory/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>

我认为LowerCaseFilter应该使Token Case不敏感,事实上,在只是小写的Token这意味着像计算的查询将导致计算机而计算不。
其实我想要compute和Comput导致计算机。



我已经试过这个:

 < fieldType name = text_auto_lowclass =solr.TextFieldsortMissingLast =trueomitNorms =true> 
< analyzer>
< tokenizer class =solr.KeywordTokenizerFactory/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>

< analyzer>
< tokenizer class =solr.KeywordTokenizerFactory/>
< / analyzer>
< / fieldType>





由于某种原因,它也不会说。我的问题是为什么和我可以解决这个问题。

解决方案

Lucene有一个Analyzer类,你可以使用方法:




  • SimpleAnalyzer :将所有输入转换为小写。

  • StopAnalyzer :此操作会删除用于从搜索中移除噪音的字词。



  • 现在,为了你的问题,我会推荐一个叫做 ngram的技术会拆分您的查询,然后搜索这些短语。



    要知道如何执行此操作,建议您阅读让您开始。它还有关于查询的其他伟大的信息。
    这不仅会解决你的问题,但会增强你的应用程序。



    有乐趣:D


    Hallo,

    I'am implementing an autocompletion feature in Solr and have one problem.

    For autocompletion I am using

    <fieldType name="text_auto" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>  
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 
    

    I thought that the LowerCaseFilter should make the Token Case insensitiv but that ist wrong. In fact in just lowercases the Token which means that a query like "comput" would lead to "computer" while "Comput" doesn't. Actually I want comput and Comput to lead to Computer.

    I allready tried this:

    <fieldType name="text_auto_low" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>  
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 
    
    <fieldType name="text_auto_up" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>  
        </analyzer>
    </fieldType>
    

    For some reason it doesn't word either. My question is why and haw can I fix this?

    解决方案

    Lucene has the Analyser class which you can use(implement) in three ways:

    • SimpleAnalyzer : This converts all of the input to lower case.
    • StopAnalyzer : This removes words that removes noise from your search.
    • StandardAnalyzer : This does both the above filter processes and thus can 'clean up' your query.

    Now, coming to your question, i would recommend a techinque called ngram that splits up your query and then searches for those phrases instead. Thus, you can still get excellent results even if there are typos.

    To know how to do this, i suggest you to read this to get you started. It also has other great info regarding queries. This not only will solve your problem, but will enhance your app.

    Have fun :D

    这篇关于Solr case不敏感的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆