Solr的TikaEntityProcessor无法正常工作 [英] Solr's TikaEntityProcessor not working

查看：160 发布时间：2020/9/4 23:07:09 solr apache-tika solr-cell

本文介绍了Solr的TikaEntityProcessor无法正常工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试让Solr索引数据库，其中一列是我要索引的PDF文档的文件名.我的配置如下:

I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this:

<dataConfig>
 <dataSource name="ds-db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/document_db" user="user" password="password" readOnly="true"/>
 <dataSource name="ds-file" type="BinFileDataSource"/>
 <document name="documents">
   <entity name="document" dataSource="ds-db" query="select * from documents">
     <entity processor="TikaEntityProcessor" url="/some/path/${document.filename}" dataSource="ds-file" format="text">
       <field column="text" />
     </entity>
   </entity>
 </document>
</dataConfig>

我正在从树干上使用Solr(截至上周).导入过程顺利完成，并且从数据库中选取了列，但没有从PDF文件中选取内容.它绝对是试图访问PDF文件，因为如果我给它一个不正确的路径名，它会抱怨.不过，它似乎并没有试图为PDF编制索引，因为它大约需要40毫秒才能完成，而如果我通过ExtractingRequestHandler导入PDF，则大约需要11秒钟才能为它编制索引.

I'm using Solr from trunk (as of last week). The import process completes without errors, and it picks up the columns from the database, but not the content from the PDF file. It is definitely trying to access the PDF file, for if I give it an incorrect path name, it complains. It doesn't seem to be attempting to index the PDF, though, as it completes in about 40ms, whereas if I import the PDF via the ExtractingRequestHandler, it takes about 11 seconds to index it.

我还尝试了example-DIH中的tika示例，而且似乎也没有索引任何内容.我是在做错什么，还是这还行不通?

I've also tried the tika example in example-DIH and that doesn't seem to index anything, either. Am I doing something wrong, or is this just not working yet?

我正在OSX 10.6.3上运行Java 1.6.0_20.

I'm running Java 1.6.0_20 on OSX 10.6.3.

(我应该注意，我已经将其发布在solr-user邮件列表中，没有得到答案.)

(I should note that I already posted this on the solr-user mailing list and didn't get an answer.)

Solr的TikaEntityProcessor无法正常工作 [英] Solr's TikaEntityProcessor not working

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr的TikaEntityProcessor无法正常工作 [英] Solr&#39;s TikaEntityProcessor not working

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Solr的TikaEntityProcessor无法正常工作 [英] Solr's TikaEntityProcessor not working

登录关闭