如何在SOLR中创建字符串字段的不区分大小写的副本? [英] How to create a case insensitive copy of a string field in SOLR?

查看:93
本文介绍了如何在SOLR中创建字符串字段的不区分大小写的副本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何创建不区分大小写形式的字符串字段的副本?我想使用典型的字符串类型和不区分大小写的类型。类型的定义如下:

How can I create a copy of a string field in case insensitive form? I want to use the typical "string" type and a case insensitive type. The types are defined like so:

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

该字段的示例如下:

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

我尝试像这样使用CopyField:

I tried using CopyField like so:

<copyField source="destANYStr" dest="destANYStrCI" />

但是,显然,在调用任何分析器之前,CopyField是在源和目标上调用的,所以即使我ve通过分析程序指定dest不区分大小写,将从源字段复制的值的大小写得以保留。

But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved.

我希望避免在

推荐答案

在SO没有答案的情况下,我跟踪了SOLR用户列表。我发现,即使考虑了copyField的影响,我的string_ci字段也无法按预期工作。 Ahmet Arslan解释了为什么 string_ci字段应使用solr.TextField而不是solr.StrField:

With no answers from SO, I followed up on the SOLR users list. I found that my string_ci field was not working as expected before even considering the effects of copyField. Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField:


来自apache-solr-1.4。 0\example\solr\conf\schema.xml:

From apache-solr-1.4.0\example\solr\conf\schema.xml :

未分析StrField类型,而是逐字索引/存储了该字段。

"The StrField type is not analyzed, but indexed/stored verbatim."

solr.TextField允许指定自定义文本分析器,该分析器指定为标记器和标记过滤器列表。

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters."

以他提供的示例和我自己稍作调整,以下字段定义似乎可以解决问题,现在CopyField也可以按预期工作。

With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well.

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

destANYStrCI字段将存储大小写保留的值,但将提供不区分大小写的字段进行搜索。 CAVEAT:不区分大小写的通配符搜索无法完成,因为通配符短语会绕过查询分析器,并且在与索引匹配之前不会被小写。这意味着通配符短语中的字符必须小写才能匹配。

The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. This means that the characters in wildcard phrases must be lowercase in order to match.

这篇关于如何在SOLR中创建字符串字段的不区分大小写的副本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆