如何在 Solr 中配置词干提取? [英] How to configure stemming in Solr?

查看:20
本文介绍了如何在 Solr 中配置词干提取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我添加到 solr 索引:美国人".当我按美国"搜索时,没有结果.

应该如何配置 schema.xml 以获得结果?

当前配置:

<分析器类型=索引"><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory"同义词="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/><filter class="solr.RemoveDuplicatesTokenFilterFactory"/><filter class="solr.PorterStemFilterFactory"/></分析器><分析器类型=查询"><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.SynonymFilterFactory"同义词="synonyms.txt" ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/><filter class="solr.RemoveDuplicatesTokenFilterFactory"/><filter class="solr.PorterStemFilterFactory"/></分析器></fieldType>

解决方案

为什么要有两个词干分析器?
尝试从您的两种分析器类型中删除 EnglishPorterFilterFactory(已弃用),重建索引,然后尝试搜索 American 是否会产生 America.>

如果这不起作用,您可以尝试的另一件事是删除两个词干过滤器并添加 SnowballPorterFilterFactorylanguage="English".

I add to solr index: "American". When I search by "America" there is no results.

How should schema.xml be configured to get results?

current configuration:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>
            </analyzer>
        </fieldType>

解决方案

Why would you have two stemmers?
Try removing EnglishPorterFilterFactory (deprecated) from both of your analyzer types, rebuild the index and then try whether search for American will yield America.

If that wont work, the other thing you can try is to remove both of your stemmer filters and add SnowballPorterFilterFactory with language="English" instead.

这篇关于如何在 Solr 中配置词干提取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆