试图索引PDF时Elasticsearch解析异常错误 [英] Elasticsearch Parse Exception error when attempting to index PDF

查看：1813 发布时间：2016/8/1 21:17:48 pdf base64 elasticsearch apache-tika osx-server

本文介绍了试图索引PDF时Elasticsearch解析异常错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚开始接触elasticsearch。我们的要求，我们有需要索引成千上万的PDF文件，我有一个很难得到只是其中之一索引成功。

I'm just getting started with elasticsearch. Our requirement has us needing to index thousands of PDF files and I'm having a hard time getting just ONE of them to index successfully.

安装的附件类型的插件，并得到响应：已安装的映射-附件

Installed the Attachment Type plugin and got response: Installed mapper-attachments.

其次在行动教程但在附件类型进程挂起和我不知道该如何跨preT错误消息。也尝试过它挂在同一个地方要点。

Followed the Attachment Type in Action tutorial but the process hangs and I don't know how to interpret the error message. Also tried the gist which hangs in the same place.

$ curl -X POST "localhost:9200/test/attachment/" -d json.file 
{"error":"ElasticSearchParseException[Failed to derive xcontent from (offset=0, length=9): [106, 115, 111, 110, 46, 102, 105, 108, 101]]","status":400}

更多细节：

的 json.file 包含一个嵌入式的Base64 PDF文件（按说明）。该文件的第一行的出现的正确的（反正我）： {文件：JVBERi0xLjQNJeLjz9MNCjE1OCAwIG9iaiA8 ...

The json.file contains an embedded Base64 PDF file (as per instructions). The first line of the file appears correct (to me anyway): {"file":"JVBERi0xLjQNJeLjz9MNCjE1OCAwIG9iaiA8...

我不知道，也许在 json.file 无效，或者如果可能elasticsearch只是没有设置正确地解析PDF文件？！？

I'm not sure if maybe the json.file is invalid or if maybe elasticsearch just isn't set up to parse PDFs properly?!?

编码 - 这里是我们如何编码成PDF json.file （按教程）：

Encoding - Here's how we're encoding the PDF into json.file (as per tutorial):

coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file

也试过：

coded=`openssl base64 -in fn6742.pdf

日志：

[2012-06-07 12:32:16,742][DEBUG][action.index             ] [Bailey, Paul] [test][0], node[AHLHFKBWSsuPnTIRVhNcuw], [P], s[STARTED]: Failed to execute [index {[test][attachment][DauMB-vtTIaYGyKD4P8Y_w], source[json.file]}]
org.elasticsearch.ElasticSearchParseException: Failed to derive xcontent from (offset=0, length=9): [106, 115, 111, 110, 46, 102, 105, 108, 101]
    at org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:147)
    at org.elasticsearch.common.xcontent.XContentHelper.createParser(XContentHelper.java:50)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:451)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:437)
    at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:290)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:210)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:680)

希望有人可以帮我看看我丢失或做错了？

Hoping someone can help me see what I'm missing or did wrong?

试图索引PDF时Elasticsearch解析异常错误 [英] Elasticsearch Parse Exception error when attempting to index PDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

试图索引PDF时Elasticsearch解析异常错误 [英] Elasticsearch Parse Exception error when attempting to index PDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭