曲奇中使用cookie的正确工作形式是什么 [英] What is the correct form of work with cookies in scrapy
问题描述
我是新手,我正在使用Cookie的网络中使用scrapy,这对我来说是个问题,因为我可以在没有Cookie的情况下获取数据,而在包含Cookie的情况下获取数据是困难的我。
我有此代码结构
I'm very newbie,I am working with scrapy in a web that use cookies, This is a problem for me , because I can obtain data the a web without cookies but obtain the data of a web with cookies is dificult for me. I have this code structure
class mySpider(BaseSpider):
name='data'
allowed_domains =[]
start_urls =["http://...."]
def parse(self, response):
sel = HtmlXPathSelector(response)
items = sel.xpath('//*[@id=..............')
vlrs =[]
for item in items:
myItem['img'] = item.xpath('....').extract()
yield myItem
这很好,我可以使用此代码结构
来获得没有cookie的数据,因为我可以使用cookie,在这个网址中,但我不明白我应该将这段代码放在哪里,然后才能使用xpath来获取数据
This is fine, I can obtain fine the data without cookies using this code structure I found it as I can work with cookies, in this url, but I do not understand where I should put this code to then be able to get the data using xpath
我正在测试此代码
request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})
但我不知道我可以在哪里工作或将这段代码放在哪里,
代码放入函数解析中
but I don't know as I can work or where put this code, I put this code into the function parse, for obtain the data
def parse(self, response):
request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})
sel = HtmlXPathSelector(request_with_cookies)
print request_with_cookies
我尝试将XPath与这个带有cookie的新网址一起使用,以便以后打印此新数据时抓取
,我认为这就像处理一个没有cookie
的url,但是当我运行它时我有一个错误,因为'Request'对象没有属性'body_as_unicode'
使用这些cookie的正确方法是什么,我有点迷失了
非常感谢。
I try of use XPath with this new url with cookies , for later print this new data scraping I thought it was like working with an url without cookies but when I run this I have a mistake because 'Request' object has no attribute 'body_as_unicode' What would be the proper way to work with these cookies, I'm a little lost Thank you very much.
推荐答案
您非常亲密!
parse()方法的契约是它 Item
yield s(或返回一个可迭代的) > s,请求
s,或两者兼而有之。就您而言,您所要做的就是
You are very close!
The contract for the parse() method is that it yield
s (or returns an iterable) of Item
s, Request
s, or a mix of both. In your case, all you should have to do is
yield request_with_cookies
,您的parse()方法将再次运行 Response
对象,该对象是通过使用
and your parse() method will be run again with a Response
object produced from requesting that URL with those cookies.
http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse
< a href = http://doc.scrapy.org/en/latest/topics/request-response.html rel = nofollow> http://doc.scrapy.org/en/latest/topics/request- response.html
这篇关于曲奇中使用cookie的正确工作形式是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!