有没有办法在使用 Solr 同义词时更多地提升原始术语? [英] Is there way to boost original term more while using Solr synonyms?

查看:25
本文介绍了有没有办法在使用 Solr 同义词时更多地提升原始术语?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如我有同义词笔记本电脑,上网本,笔记本在 index_synonyms.txt 中

For example I have synonyms laptop,netbook,notebook in index_synonyms.txt

当用户搜索上网本时,我想提升原始文本,然后通过同义词扩展?有没有办法在 SynonymFilterFactory 中指定它?例如使用原始术语两次,这样他的 TF 就会更大

When user search for netbook I want to boost original text more then expanded by synonyms? Is there way to specify this in SynonymFilterFactory? For example use original term twice so his TF will be bigger

推荐答案

据我所知,现有的 SynonymFilterFactory 无法做到这一点.但以下是您可以用来获得此行为的技巧.

As far as I know, there is no way to do this with the existing SynonymFilterFactory. But following is a trick you can use to get this behavior.

假设您的字段名为 title.创建另一个字段,它是此副本的副本,例如 title_synonyms.现在确保 SynonymFilterFactory 仅用作 title_synonyms 的分析器(您可以通过为两个字段使用不同的字段类型来实现此目的 - 例如 texttext_synonyms).在这两个字段中搜索,但对 title 的提升高于 title_synonyms.

Let's say your field is called title. Create another field which is a copy of this, say title_synonyms. Now ensure that SynonymFilterFactory is used as an analyzer only for title_synonyms (you can do this by using different field types for the two fields — say text and text_synonyms). Search in both these fields but give higher boost to title than title_synonyms.

以下是示例字段类型定义:

Here are sample field type definitions:

    <fieldType name="text" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="text_synonyms" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

这里是示例字段定义:

    <field name="title" type="text" stored="false"
           required="true" multiValued="true"/>
    <field name="title_synonyms" type="text_synonyms" stored="false"
           required="true" multiValued="true"/>

title 字段复制到title_synonyms:

<copyField source="title" dest="title_synonyms"/>

如果您使用 dismax,您可以像这样对这些字段进行不同的提升:

If you are using dismax, you can give different boosts to these fields like so:

    <str name="qf">title^10 title_synonyms^1</str>

这篇关于有没有办法在使用 Solr 同义词时更多地提升原始术语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆