solr 方面搜索截断单词 [英] solr facet search truncate words

查看:29
本文介绍了solr 方面搜索截断单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为法语内容配置一个solr.搜索很好,但是当我激活构面搜索时,单词会以一种特殊的方式被截断.

have a solr configured for french content. Search is fine, but when i activate facet search, words are truncated in a special way.

所有的 e 都消失了,例如 automobil 而不是 car,montagn 不是 montagne,styl 不是 style ,homm => homme 等等....

All e disappear, for eg automobil instead of automobile, montagn instead of montagne, styl instead of style , homm => homme etc....

<lst name="keywords">
    <int name="automobil">1</int>
    <int name="citroen">1</int>
    <int name="minist">0</int>
    <int name="polit">0</int>
    <int name="pric">0</int>
    <int name="shinawatr">0</int>
    <int name="thailand">0</int>
</lst

这里是查询 q=fulltextfield:champpions&facet=true&facet.field=keywords

here is the query q=fulltextfield:champpions&facet=true&facet.field=keywords

关键词内容:

<arr name="keywords">
    <str>Ski</str>
    <str>sport</str>
    <str>Free style</str>
    <str>automobile</str>
    <str>Rallye</str>
    <str>Citroen</str>
    <str>montagne</str>
</arr>

这里是使用的架构:

<fieldtype name="text_fr" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_fr.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="French"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_fr.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="French"/>
  </analyzer>
</fieldtype>

字段定义:

如果有人对这个问题有想法......

If somebody have an idea about that issue....

感谢您的回答.问候杰罗姆·朗吉特

Thanks for your answer. regards Jerome longet

推荐答案

一般来说,如果要将字段用作分面,则应将其存储为字符串.

Generally, if you want to use a field as a facet, it should be stored as a string.

您正在对一个标记化和过滤的字段进行分面,因此各个值是您的关键字字段中处理过的词.

You're faceting on a tokenized and filtered field, so the individual values are the processed words in your keywords field.

这篇关于solr 方面搜索截断单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆