Solr DataImportHandler不适用于XML文件 [英] Solr DataImportHandler doesn't work with XML Files

查看：170 发布时间：2018/8/2 15:30:59 xml solr indexing dataimporthandler

本文介绍了Solr DataImportHandler不适用于XML文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对Solr很新。我成功通过DIH索引来自我的sql数据库的数据。现在我想导入xml文件并通过DIH索引它们，但它不起作用！
我的data-config.xml如下所示：

I'm very new to Solr. I succeeded in indexing data from my sql database via DIH. Now I want to import xml files and index them also via DIH but it just won't work! My data-config.xml looks like this:

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
    <entity name="dir" 
            processor="FileListEntityProcessor" 
            baseDir="/bla/test2" 
            fileName=".*xml"
            stream="true"
            recursive="false"       
            rootEntity="false">
            <entity name="PubmedArticle"
                    processor="XPathEntityProcessor"
                    transformer="RegexTransformer"
                    stream="true"
                    forEach="/PubmedArticle"
                    url="${dir.fileAbsolutePath}">


                <field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />
                <field column="authors" xpath="//Name[.='authors']/following-sibling::Value/text()" />

             ..etc

我在schema.xml中有以下字段：

And i have the following fields in schema.xml:

< field name =journaltype =text indexed =truestored =truerequired =true/> < field name =authorstype =textindexed =truestored =truerequired =true/>

当我运行Solr时，我没有错误，也没有索引文档：

When i run Solr i get no errors and no document is indexed:

<str name="Total **Rows Fetched**">**2000**</str>
<str name="Total **Documents Skipped**">**0**</str>
<str name="Full Dump Started">2012-02-01 14:59:17</str>
<str name="">Indexing completed. **Added/Updated: 0 documents.** Deleted 0 documents.

谁能告诉我我做错了什么？！我甚至仔细检查了路径语法...

Can anyone tell me what i did wrong?! I have even double checked the path syntax...

推荐答案

我最近在尝试同样的事情时遇到了同样的问题;即，当使用 FileListEntityProcessor （读取多个本地.xml文件）和 XPathEntityProcessor （以获取某些XML元素）时。

I recently encountered the same problem when trying the same thing; i.e., when using FileListEntityProcessor (to read multiple local .xml files) and XPathEntityProcessor (to grab certain XML elements).

根本原因：在此行中：

<field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />

解释：xpath属性的参数（//名称。 ..）虽然有效的xpath语法，但Solr不支持。 Apache Solr 4.4参考指南简单地说：
XPath表达式，它将从该字段的记录中提取内容。仅支持Xpath语法的子集。

Explanation: the argument for the xpath attribute ("//Name..."), while valid xpath syntax, is NOT supported by Solr. The "Apache Solr 4.4 Reference Guide" simply says: The XPath expression which will extract the content from the record for this field. Only a subset of Xpath syntax is supported.

解决方案：将xpath的参数更改为文档的完整路径root：

Solution: Change the argument for xpath to be the full path from the document root:

<field column="journal" xpath="/full/path/from/root/of/document/Name[.='journal']/following-sibling::Value/text()" />

这篇关于Solr DataImportHandler不适用于XML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Solr DataImportHandler不适用于XML文件 [英] Solr DataImportHandler doesn't work with XML Files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr DataImportHandler不适用于XML文件 [英] Solr DataImportHandler doesn&#39;t work with XML Files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Solr DataImportHandler不适用于XML文件 [英] Solr DataImportHandler doesn't work with XML Files

登录关闭