Solr 索引在分隔符上拆分字段 [英] Solr Indexing Splitting Field On Delimiter

查看：53 发布时间：2021/10/1 19:37:30 xml solr

本文介绍了Solr 索引在分隔符上拆分字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用一些数据设置 Solr 索引，但是我想将我的一个字段作为管道分隔发送下来，并在 Solr 端拆分，例如

<添加><field name="cat">a|b|c<field></添加></doc>

对于声明为

的多值字段

管道类型的拆分是

--><!-- 此过滤器可以删除出现在同一位置的任何重复标记 - 有时可以使用 WordDelimiterFilter 与词干结合.--><filter class="solr.RemoveDuplicatesTokenFilterFactory"/></分析器><分析器类型=查询"><tokenizer class="solr.PatternTokenizerFactory" pattern="\|\s*"/><filter class="solr.LowerCaseFilterFactory"/><!--<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>--><!-- 此过滤器可以删除出现在同一位置的任何重复标记 - 有时可以使用 WordDelimiterFilter 与词干结合.--><filter class="solr.RemoveDuplicatesTokenFilterFactory"/></分析器></fieldType>

我希望这与我发送带有三个不同 cat 字段的文档相同，但是它似乎没有做太多事情，只是不断返回我的管道分隔列表.

我正在尝试做的事情是否可行，如果可行，我哪里出错了?

谢谢，阿玛尔

解决方案

使用 PatternTokenizer 将仅更改内部表示而不是存储值.如果您希望 Solr 将其视为具有多个可显示值的多值字段，那么您需要发送 3 个不同的 cat 字段.

如果您使用的是DataImportHandler，那么您可以使用RegexTransformer 来拆分数据.

I am trying to setup a Solr index with some data, however I would like to send one of my fields down as pipe delimited and have it split on the Solr end e.g.

<doc>
 <add>
  <field name="cat">a|b|c<field>
 </add>
</doc>

For a multi-valued field declared as

<field name="cat" type="str_split_on_pipe" indexed="true" stored="true" multiValued="true" omitNorms="true" />

And the split on pipe type is

<fieldType name="str_split_on_pipe" class="solr.TextField" positionIncrementGap="100" >
  <analyzer type="index">
      <tokenizer class="solr.PatternTokenizerFactory" pattern="\|\s*" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <!--<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>-->
      <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
     possible with WordDelimiterFilter in conjuncton with stemming. -->
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.PatternTokenizerFactory" pattern="\|\s*" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <!--<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>-->
      <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
     possible with WordDelimiterFilter in conjuncton with stemming. -->
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

I would expect this to be the same as if I send the document with three different cat fields, however it doesn't seem to do much and just keeps returning my pipe separated list.

Is what I am trying to do possible, and if so where have I gone wrong?

Thanks, Amar

解决方案

Using a PatternTokenizer will change only the internal representation and not the stored value. If you want Solr to treat it as a multi-valued field with multiple displayable values, then you need to send in 3 different cat fields.

If you are using DataImportHandler, then you can use the RegexTransformer to split the data.

这篇关于Solr 索引在分隔符上拆分字段的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Solr 索引在分隔符上拆分字段 [英] Solr Indexing Splitting Field On Delimiter

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr 索引在分隔符上拆分字段 [英] Solr Indexing Splitting Field On Delimiter

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭