是否可以让 Solr 的 DataImportHadler 忽略带有空字符串的字段? [英] Is it possible to get Solr's DataImportHadler to ignore fields with empty strings?

查看:12
本文介绍了是否可以让 Solr 的 DataImportHadler 忽略带有空字符串的字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Solr 的 DataImportHandler 从数据库导入数据.如果该列没有值,则某些记录具有空字符串.

I am using Solr's DataImportHandler to import data from a database. Some of the records have empty strings if there is no value for that column.

目前我的配置生成如下 Solr 文档:

Currently the configuration I have produces Solr documents like this:

{
    "x": "value",
    "y": "",
    "z": 2
}

但是我想忽略所有没有价值的字段,以便创建这样的文档:

However I would like to ignore all fields that have no value so that documents like this are created:

{
    "x": "value",
    "z": 2
}

有什么我可以在配置文件中为 DataImportHandler 定义的东西,它会给我我想要的结果吗?

Is there something I can define in the configuration file for the DataImportHandler that will give me my desired results?

推荐答案

Solr 的一个鲜为人知的方面是您可以插入 UpdateRequestProcessor 以在 DIH 之后运行.并且,有专门针对此问题的专用 URP.

One of the little-realized aspects of Solr is that you can plug UpdateRequestProcessor to run after the DIH. And, there are specialized URPs specifically for this problem.

所以你可以这样做:

<updateRequestProcessorChain name="skip-empty">
    <!--  Next two processors affect all fields - default configuration -->
    <processor class="TrimFieldUpdateProcessorFactory" /> <!--  Get rid of leading/trailing spaces. Also empties all-spaces fields for next filter-->
    <processor class="RemoveBlankFieldUpdateProcessorFactory" /> <!--  Delete fields with no content. More efficient and allows to query for presence/absence of field -->

    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

显然,记住还要在 DIH 的处理程序定义中引用此链:

Obviously, remember to also reference this chain in the DIH's handler's definition:

<requestHandler name="/dataimport" class="solr.DataImportHandler">
  <lst name="defaults">
    ....
    <str name="update.chain">skip-empty</str>
  </lst>
</requestHandler>

您可以在 UpdateRequestProcessors 的完整列表查看http://solr-start.com

这篇关于是否可以让 Solr 的 DataImportHadler 忽略带有空字符串的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆