如何禁用scrapy中的缓存? [英] How to disable cache in scrapy?

查看:394
本文介绍了如何禁用scrapy中的缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取特定网站上的网页。该网页因通过 scrapy.Request()发送的一组不同的Cookie有所不同。

I am trying to crawl a webpage on a particular website.The webpage varies a little for different set of cookies that I sent through scrapy.Request().

如果我一个接一个地请求访问网页,它会给我正确的结果,但是当我将这些cookie发送到for循环中时,它会给我相同的结果。我认为scrapy正在为我创建缓存,并在第二个请求中从该缓存中获取响应。这是我的代码:

If I make the request to webpage one by one , it gives me the correct result, but when I send these cookies in for loop, it is giving me the same result. I think scrapy is creating cache for me and in the second request its taking the response from that cache.Here is my code :

def start_requests(self):
        meta = {'REDIRECT_ENABLED':True}
        productUrl = "http://xyz"
        cookies = [{'name': '', 'value': '=='},{'name': '', 'value': '=='}]
        for cook in cookies:

            header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"}
            productResponse = scrapy.Request(productUrl,callback=self.parseResponse,method='GET',meta=meta,body=str(),cookies=[cook],encoding='utf-8',priority=0,dont_filter=True)
            yield productResponse


def parseResponse(self,response): 
     selector = Selector(response)
     print selector.xpath("xpaths here").extract()
     yield None

我希望打印语句可以ld对于这两个请求给出不同的结果。

I expect that the print statement should give different result for the two requests.

如果不清楚,请在注释中提及。

If anything isn't clear , please mention in comments.

推荐答案

缓存可以通过两种方式禁用

Cache can be disable in 2 ways


  1. 更改setting.py中与缓存相关的设置中的值文件。保持HTTPCACHE_ENABLED = False

  2. 或者可以在运行时完成 scrapy crawl crawl-name --set HTTPCACHE_ENABLED = False

这篇关于如何禁用scrapy中的缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆