nutch + mysql 集成 [英] nutch + mysql integration

查看:55
本文介绍了nutch + mysql 集成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当nutch在索引阶段完成它的循环(即crawl-fetch-parse-index)时,我不想nutch去索引(lucene索引),但我想让nutch把所有爬取的数据(我相信他保持使用我的代码将它们作为 NutchDocument 对象)导入 mysql.

When nutch finishes its cycle (that is crawl - fetch- parse - index) during index phase, I do not want nutch to index (lucene index), but I want nutch to place all the crawled data (I believe he keeps them as NutchDocument object) into mysql using my code.

有没有办法做到这一点?

Is there any way to do this?

谢谢

推荐答案

创建您自己的 Java 类来管理 Nutch 循环.它应该与 org.apache.nutch.crawl.Crawl 类似,但您必须通过调用 Mysql 连接器来替换对索引器的调用.或者,您可以在每个周期内调用您的 Mysql 连接器,具体取决于您是要在爬网结束时还是在它发生时更新 Mysql.

Create your own java class that manage the Nutch cycle. It should be similar to org.apache.nutch.crawl.Crawl but you will have to replace the call to the indexer by a call to your Mysql connector. Or you can call your Mysql connector during each cycle depending on whether you want to update Mysql at the end of the crawl or while it is happening.

这篇关于nutch + mysql 集成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆