如何禁用scrapy中的缓存? [英] How to disable cache in scrapy?
问题描述
我正在尝试抓取特定网站上的网页。该网页因通过 scrapy.Request()
发送的一组不同的Cookie有所不同。
I am trying to crawl a webpage on a particular website.The webpage varies a little for different set of cookies that I sent through scrapy.Request()
.
如果我一个接一个地请求访问网页,它会给我正确的结果,但是当我将这些cookie发送到for循环中时,它会给我相同的结果。我认为scrapy正在为我创建缓存,并在第二个请求中从该缓存中获取响应。这是我的代码:
If I make the request to webpage one by one , it gives me the correct result, but when I send these cookies in for loop, it is giving me the same result. I think scrapy is creating cache for me and in the second request its taking the response from that cache.Here is my code :
def start_requests(self):
meta = {'REDIRECT_ENABLED':True}
productUrl = "http://xyz"
cookies = [{'name': '', 'value': '=='},{'name': '', 'value': '=='}]
for cook in cookies:
header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"}
productResponse = scrapy.Request(productUrl,callback=self.parseResponse,method='GET',meta=meta,body=str(),cookies=[cook],encoding='utf-8',priority=0,dont_filter=True)
yield productResponse
def parseResponse(self,response):
selector = Selector(response)
print selector.xpath("xpaths here").extract()
yield None
我希望打印语句可以ld对于这两个请求给出不同的结果。
I expect that the print statement should give different result for the two requests.
如果不清楚,请在注释中提及。
If anything isn't clear , please mention in comments.
推荐答案
缓存可以通过两种方式禁用
Cache can be disable in 2 ways
- 更改setting.py中与缓存相关的设置中的值文件。保持HTTPCACHE_ENABLED = False
- 或者可以在运行时完成 scrapy crawl crawl-name --set HTTPCACHE_ENABLED = False
这篇关于如何禁用scrapy中的缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!