Scrapy 如何过滤爬取的 url? [英] How Scrapy filters the crawled urls?

查看：77 发布时间：2021/7/16 22:08:50 scrapy

本文介绍了Scrapy 如何过滤爬取的 url?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道 Scrapy 是如何过滤那些爬取到的 url 的?它是否存储在诸如 crawled_urls_list 之类的东西中抓取的所有 url，并且当它获得一个新的 url 时，它会查找列表以检查该 url 是否存在?

I want to know how Scrapy filters those crawled urls? Does it store all urls which are crawled in something like crawled_urls_list, and when it get a new url it looks up the list to check if the url exists ?

CrawlSpider(/path/to/scrapy/contrib/spiders/crawl.py)这个过滤部分的代码在哪里?

Where are the codes of this filtering part of CrawlSpider(/path/to/scrapy/contrib/spiders/crawl.py) ?

非常感谢！

Scrapy 如何过滤爬取的 url? [英] How Scrapy filters the crawled urls?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scrapy 如何过滤爬取的 url? [英] How Scrapy filters the crawled urls?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭