将 MySQL 连接到 Apache nutch [英] Connecting MySQL to Apache nutch
问题描述
我第一次使用 Apache Nutch.爬取后如何将数据存储到MySQL数据库中?我希望能够轻松地在其他网络应用程序中使用这些数据.
I am using Apache Nutch first time. How can I store data into a MySQL database after crawling? I want to be able to easily use the data in other web applications.
我发现了一个问题相关,但我不清楚代码id的哪一部分将被 MySQL 连接器替换.请帮忙提供一个简短的代码示例.
I found a question related, but I don't clearly understand which part of the code id gona replace by MySQL connector. Please help with a short code example.
推荐答案
从 http://mirror.nyi.net/apache//nutch/apache-nutch-1.2-src.zip
在编辑器中打开 org.apache.nutch.crawl.Crawl
类.
Open org.apache.nutch.crawl.Crawl
class in your editor.
查找变量Path crawlDb = new Path(dir + "/crawldb");
该变量将提示在何处替换代码以获得您自己的 CustomMySQLCrawl
类.
The variable will give a hint on where to replace the code in order to get your own CustomMySQLCrawl
class.
在此调用期间发生持久性:crawlDbTool.update(crawlDb, segs, true, true);//更新crawldb
所以你应该把它保存到数据库中.此时您可能需要考虑集成 hibernate.
The persistence is happening during this call: crawlDbTool.update(crawlDb, segs, true, true); // update crawldb
So there is where you should save it to the database. You might want to consider integrating hibernate at this point.
这篇关于将 MySQL 连接到 Apache nutch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!