成功进行Nutch搜寻后,Elasticsearch索引编制失败 [英] Elasticsearch indexing fails after successful Nutch crawl

查看:43
本文介绍了成功进行Nutch搜寻后,Elasticsearch索引编制失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定为什么,但是Nutch 1.13无法将数据索引到ES(v2.3.3).它正在爬网,这很好,但是到索引ES的时候,它给了我这个错误消息:

I'm not sure why but Nutch 1.13 is failing to index the data to ES (v2.3.3). It is crawling, that is fine, but when it comes time to index to ES its giving me this error message:

Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)

就在这之前:

elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)

我不确定超时是否与工作失败有关?

I'm not sure exactly if the timeout has anything to do with the job failing?

我已经多次运行Nutch v1.10,没有问题,但是决定立即进行升级.到目前为止,升级之前从未出现此错误.

I've run Nutch v1.10 many times with no problems but decided to upgrade now. Never had this error before until now, with upgrading.

仔细检查错误消息后:

After closer inspection of the error message:

    Error running:
  /home/david/tutorials/nutch/nutch-1.13/runtime/local/bin/nutch index -Delastic.server.url=http://localhost:9300/search-index/ searchcrawl//crawldb -linkdb searchcrawl//linkdb searchcrawl//segments/20170519125546

在该特定细分市场上似乎失败了,这意味着什么?我只知道如何使用Nutch的基础知识,我绝对不是专家.链接失败吗?

It seems to be failing there, on that particular segment, what does that mean? I only know the basics of how to use Nutch, I'm by no means an expert. Is it failing on a link?

推荐答案

直到Nutch 1.14发布,您需要应用此补丁 https://github.com/apache/nutch/pull/156 并重建:

Until Nutch 1.14 is out, you need to apply this patch https://github.com/apache/nutch/pull/156 and rebuild:

cd apache-nutch-1.13
wget https://raw.githubusercontent.com/apache/nutch/e040ace189aa0379b998c8852a09c1a1a2308d82/src/java/org/apache/nutch/indexer/CleaningJob.java
mv CleaningJob.java  src/java/org/apache/nutch/indexer/.

这篇关于成功进行Nutch搜寻后,Elasticsearch索引编制失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆