Scrapy仅抓取http标头数据 [英] Scrapy crawl http header data only

查看：264 发布时间：2018/7/10 15:00:46 python http-headers scrapy

本文介绍了Scrapy仅抓取http标头数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

（如何）我可以认为scrapy只下载网站的标题数据（用于检查目的等）。

(How) can I archieve that scrapy only downloads the header data of a website (for check purposes etc.)

我试图禁用一些下载 - 中间件，但它似乎不起作用。

I've tried to disable some download-middlewares but it doesn't seem to work.

像@alexce所说，你可以发出HEAD 请求而不是默认的GET：

Like @alexce said, you can issue HEAD Requests instead of the default GET:

Request(url, method="HEAD")

更新：如果您想对 start_urls 使用HEAD请求，则需要覆盖 make_requests_from_url 方法：

UPDATE: If you want to use HEAD requests for your start_urls you will need to override the make_requests_from_url method:

def make_requests_from_url(self, url):
    return Request(url, method='HEAD', dont_filter=True)

这篇关于Scrapy仅抓取http标头数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文