空的Nutch抓取列表 [英] Empty Nutch crawl list
问题描述
我正在尝试在Eclipse中使用Nutch进行爬网.
I'm trying to run a crawl using Nutch in Eclipse.
我正在使用一个名为urls的文件,其中包含
I'm using a file called urls, and it contains
但是,当我运行项目时,Generator类告诉我:
However, when I run the project, the Generator class tells me that:
选择0条记录进行提取,退出"
"0 records selected for fetching, exiting"
我该如何解决这个问题?
How can I solve this issue?
我已遵循以下文档:
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
http://wiki.apache.org/nutch/NutchTutorial
任何帮助将不胜感激.
推荐答案
我最近遇到了这个问题,发现大多数回复都与(regex | crawl)-urlfiters.txt有关.要检查的另一件事是您的"-topN"设置.它必须足够大,以使生成器通过所有过滤器.
I recently ran into this issue and found that most responses concerned the (regex|crawl)-urlfiters.txt. Another thing to check is your '-topN' settings. This needs to be large enough for the generator to pass all filters.
我希望这会有所帮助.
这篇关于空的Nutch抓取列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!