使用Nuch REST API的Nutch弹性索引器中的未知问题 [英] Unknown issue in Nutch elastic indexer with nutch REST api

查看:73
本文介绍了使用Nuch REST API的Nutch弹性索引器中的未知问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用REST端点公开胡扯,并在索引器阶段遇到问题.我正在使用Elasticsearch索引编写器将文档索引到ES.我已经使用了$ NUTCH_HOME/runtime/deploy/bin/nutch startserver命令.在建立索引时会引发未知异常.

I was trying to expose nutch using REST endpoints and ran into an issue in indexer phase. I'm using elasticsearch index writer to index docs to ES. I've used $NUTCH_HOME/runtime/deploy/bin/nutch startserver command. While indexing an unknown exception is thrown.

错误:com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;16/10/07 16:01:47 INFO mapreduce.Job:映射100%减少0%16/10/0716:01:49 INFO mapreduce.工作:任务ID:try_1475748314769_0107_r_000000_1,状态:FAILED错误:com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;16/10/07 16:01:53 INFO mapreduce.Job:任务ID:try_1475748314769_0107_r_000000_2,状态:FAILED错误:com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;16/10/07 16:01:58 INFO mapreduce.Job:地图100%减少100%16/10/0716:01:59 INFO mapreduce.作业:作业job_1475748314769_0107失败,原因为状态失败,原因是:任务失败task_1475748314769_0107_r_000000作业失败,因为任务失败.failedMaps:0 failedReduces:1

Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:47 INFO mapreduce.Job: map 100% reduce 0% 16/10/07 16:01:49 INFO mapreduce.Job: Task Id : attempt_1475748314769_0107_r_000000_1, Status : FAILED Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:53 INFO mapreduce.Job: Task Id : attempt_1475748314769_0107_r_000000_2, Status : FAILED Error: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; 16/10/07 16:01:58 INFO mapreduce.Job: map 100% reduce 100% 16/10/07 16:01:59 INFO mapreduce.Job: Job job_1475748314769_0107 failed with state FAILED due to: Task failed task_1475748314769_0107_r_000000 Job failed as tasks failed. failedMaps:0 failedReduces:1

错误indexer.IndexingJob:索引器:java.io.IOException:作业失败!在org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)处org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)在org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)在org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

失败,退出代码为255.

Failed with exit code 255.

任何帮助将不胜感激.

PS:使用堆栈跟踪进行调试后,我认为问题是由于番石榴版本不匹配所致.我尝试过更改插件(parse-tika和parsefilter-naivebayes)的build.xml,但是没有用.

PS : After debugging using stack trace I think the issue is due to mismatch in guava version. I've tried changing build.xml of plugins(parse-tika and parsefilter-naivebayes) but it didn't work.

推荐答案

我已经找到了解决此问题的方法.这是由于番石榴相关性的版本兼容性.Hadoop使用guava-11.0.2.jar作为依赖项.但nutch中的elastic indexer插件需要18.0版的番石榴.这就是为什么它试图在分布式Hadoop中运行时引发异常的原因.因此,我们只需要在hadoop库中将番石榴版本更新为18.0(可以在 $ HADOOP_HOME/share/hadoop/common/libs/中找到).

I have found solution for this issue. This is due to the version compatibility of guava dependency. Hadoop uses guava-11.0.2.jar as dependency. But the elastic indexer plugin in nutch requires 18.0 version of guava. That's why it is throwing an exception when trying to run in distributed hadoop. So we just need to update guava version to 18.0 in hadoop libs(can be found at $HADOOP_HOME/share/hadoop/common/libs/).

这篇关于使用Nuch REST API的Nutch弹性索引器中的未知问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆