Nutch在Hadoop 2.x中 [英] Nutch in Hadoop 2.x

查看:166
本文介绍了Nutch在Hadoop 2.x中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个运行Hadoop 2.2.0和HBase 0.98.1的三节点集群,我需要在其上使用Nutch 2.2.1爬行程序。但它只支持1.x分支的Hadoop版本。现在我可以向我的集群提交一个Nutch作业,但是它失败了,并且java.lang.NumberFormatException异常。
所以我的问题很简单:我如何让Nutch在我的环境中工作?

现在它是不可能将Nutch 2.2.1(Gora 0.3)与HBase 0.98.x整合。
请参阅: https://issues.apache.org/jira/browse/GORA -304



官方Nutch教程仅推荐0.90.x HBase分支:



您也可以下载HBase 0.94。我今天创建和测试的24-hadoop-2.5.0版本:
https://github.com/dobromyslov/hbase/releases/tag/0.94.24-hadoop-2.5.0



请注意Nutch 2.2.1不支持HBase 0.94.x,你必须从Git分支获得最新的Nutch 2.x: https://github.com/apache/nutch/tree/2.x


I have a three-node cluster running Hadoop 2.2.0 and HBase 0.98.1 and I need to use a Nutch 2.2.1 crawler on top of that. But it only supports Hadoop versions from 1.x branch. By now I am able to submit a Nutch job to my cluster, but it fails with java.lang.NumberFormatException. So my question is pretty simple: how do I make Nutch work in my environment?

解决方案

At the moment it's impossible to integrate Nutch 2.2.1 (Gora 0.3) with HBase 0.98.x. See: https://issues.apache.org/jira/browse/GORA-304

Official Nutch tutorial recommends only 0.90.x HBase branch: http://wiki.apache.org/nutch/Nutch2Tutorial

Also you can download HBase 0.94.24-hadoop-2.5.0 version which I created and tested today: https://github.com/dobromyslov/hbase/releases/tag/0.94.24-hadoop-2.5.0

Take a note that Nutch 2.2.1 does not support HBase 0.94.x and you have to get the latest Nutch 2.x from Git branch: https://github.com/apache/nutch/tree/2.x

这篇关于Nutch在Hadoop 2.x中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆