曲奇中使用cookie的正确工作形式是什么 [英] What is the correct form of work with cookies in scrapy

查看:109
本文介绍了曲奇中使用cookie的正确工作形式是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手,我正在使用Cookie的网络中使用scrapy,这对我来说是个问题,因为我可以在没有Cookie的情况下获取数据,而在包含Cookie的情况下获取数据是困难的我。
我有此代码结构

I'm very newbie,I am working with scrapy in a web that use cookies, This is a problem for me , because I can obtain data the a web without cookies but obtain the data of a web with cookies is dificult for me. I have this code structure

class mySpider(BaseSpider):
    name='data'
    allowed_domains =[]
    start_urls =["http://...."]

def parse(self, response):
    sel = HtmlXPathSelector(response)
    items = sel.xpath('//*[@id=..............')

    vlrs =[]

    for item in items:
        myItem['img'] = item.xpath('....').extract()
        yield myItem

这很好,我可以使用此代码结构
来获得没有cookie的数据,因为我可以使用cookie,在这个网址中,但我不明白我应该将这段代码放在哪里,然后才能使用xpath来获取数据

This is fine, I can obtain fine the data without cookies using this code structure I found it as I can work with cookies, in this url, but I do not understand where I should put this code to then be able to get the data using xpath

我正在测试此代码

request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})

但我不知道我可以在哪里工作或将这段代码放在哪里,
代码放入函数解析中

but I don't know as I can work or where put this code, I put this code into the function parse, for obtain the data

def parse(self, response):
    request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})

    sel = HtmlXPathSelector(request_with_cookies)
    print request_with_cookies

我尝试将XPath与这个带有cookie的新网址一起使用,以便以后打印此新数据时抓取
,我认为这就像处理一个没有cookie
的url,但是当我运行它时我有一个错误,因为'Request'对象没有属性'body_as_unicode'
使用这些cookie的正确方法是什么,我有点迷失了
非常感谢。

I try of use XPath with this new url with cookies , for later print this new data scraping I thought it was like working with an url without cookies but when I run this I have a mistake because 'Request' object has no attribute 'body_as_unicode' What would be the proper way to work with these cookies, I'm a little lost Thank you very much.

推荐答案

您非常亲密!
parse()方法的契约是它 Item yield s(或返回一个可迭代的) > s,请求 s,或两者兼而有之。就您而言,您所要做的就是

You are very close! The contract for the parse() method is that it yields (or returns an iterable) of Items, Requests, or a mix of both. In your case, all you should have to do is

yield request_with_cookies

,您的parse()方法将再次运行 Response 对象,该对象是通过使用

and your parse() method will be run again with a Response object produced from requesting that URL with those cookies.

http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse
< a href = http://doc.scrapy.org/en/latest/topics/request-response.html rel = nofollow> http://doc.scrapy.org/en/latest/topics/request- response.html

这篇关于曲奇中使用cookie的正确工作形式是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆