Solr DIH regextransformer - 只处理一个CSV行 [英] Solr DIH regextransformer - processes only one CSV line

查看:357
本文介绍了Solr DIH regextransformer - 只处理一个CSV行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我有以下CSV文件

Hi I have the following CSV file

132 1536130302256087040
133 1536130302256087041
134 1536130302256087042

这些字段由选项卡分隔。
现在我有了solr的Dataimporthandler(DIH),我尝试将它导入到solr,但我只得到第一行到solr。这是结果,但CSV中的其他行丢失:

the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing:

  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 1,
    "docs": [ {
        "string": "1536130302256087040",
        "id": "132",
        "_version_": 1536202153221161000
      } ] }

这是我的data-config.xml

Here is my data-config.xml

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
    <document>

     <entity name="f" 
     processor="FileListEntityProcessor" 
     fileName="myfile.csv" 
     baseDir="/var/www/solr-5.4.0/server/csv/files" 
     recursive="false" 
     rootEntity="true" 
     dataSource="null" >

     <entity 
     onError="continue" 
     name="jc"   
     processor="LineEntityProcessor" 
     url="${f.fileAbsolutePath}" 
     dataSource="fds"  
     rootEntity="true" 
     header="false"
     separator="\t"
     transformer="RegexTransformer" >

     <field column="id" name="id" sourceColName="rawLine" regex="^(.*)\t"/>
     <field column="string" name="string" sourceColName="rawLine" regex="\t(.*)$"/>

             </entity>            
        </entity>
    </document>
</dataConfig>

以下是我的schema.xml

Here is my schema.xml

<field name="id" type="text_general" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="string" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>

 <uniqueKey>id</uniqueKey>

我做错了什么?

推荐答案

对于两个实体级别,您都有 rootEntity = true 。所以,你将只得到一个外部实体的文档。尝试将外层级的rootEntity设置为false。

You have rootEntity=true for both levels of entities. So, you will only get one document for the outer entity. Try setting the outer level rootEntity to false.

此外,您可以使用 CSV处理器,无需DIH。

Also, you can just send tab-separated files to the Solr with CSV processor, no DIH required.

这篇关于Solr DIH regextransformer - 只处理一个CSV行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆