solr 会爬网吗? [英] Does solr do web crawling?
问题描述
我有兴趣做网络爬虫.我在看 solr
.
I am interested to do web crawling. I was looking at solr
.
solr
是做网络爬虫的,还是做网络爬虫的步骤是什么?
Does solr
do web crawling, or what are the steps to do web crawling?
推荐答案
Solr 5+ 实际上现在可以进行网络爬虫了!http://lucene.apache.org/solr/
Solr 5+ DOES in fact now do web crawling! http://lucene.apache.org/solr/
较旧的 Solr 版本不会单独进行网络爬行,因为从历史上看,它是一个提供全文搜索功能的搜索服务器.它建立在 Lucene 之上.
Older Solr versions do not do web crawling alone, as historically it's a search server that provides full text search capabilities. It builds on top of Lucene.
如果您需要使用另一个 Solr 项目抓取网页,那么您有多种选择,包括:
If you need to crawl web pages using another Solr project then you have a number of options including:
- Nutch - http://lucene.apache.org/nutch/
- Websphinx - http://www.cs.cmu.edu/~rcm/websphinx/
- JSpider - http://j-spider.sourceforge.net/
- Heritrix - http://crawler.archive.org/
如果您想使用 Lucene 或 SOLR 提供的搜索工具,您需要根据网络抓取结果构建索引.
If you want to make use of the search facilities provided by Lucene or SOLR you'll need to build indexes from the web crawl results.
另见:
这篇关于solr 会爬网吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!