访问会话cookie在scrapy蜘蛛 [英] Access session cookie in scrapy spiders

查看:177
本文介绍了访问会话cookie在scrapy蜘蛛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试访问蜘蛛中的会话cookie。我首先登录到使用蜘蛛的社交网络:

  def parse(self,response):

return [FormRequest.from_response(response,
formname ='login_form',
formdata = {'email':'...','pass':'...'},
callback = self.after_login)]

after_login ,我想访问会话cookie,以便将它们传递到另一个模块(selenium这里)进一步处理页面与认证的会话。



我想这样:

  def after_login(self,response):


.....

#访问该会话的cookie以访问与授权会话的
#相同域中的另一个URL。
#类似:
session_cookies = XXX.get_session_cookies()
data = another_function(url,cookies)

不幸的是, response.cookies 不返回会话Cookie。



如何获取会话Cookie?我正在查看cookies中间件: scrapy.contrib.downloadermiddleware.cookies scrapy.http.cookies ,但似乎没有任何直接的方式访问会话。



不幸的是,我使用了你的想法,但我不看看饼干,虽然我知道,因为 scrapy.contrib.downloadermiddleware.cookies 中间件打印出来的cookie!这些正是我想要抓取的饼干。



这是我在做什么:



after_login(self,response)方法在正确的身份验证之后接收响应变量,然后我访问一个带有会话数据的URL:

  def after_login(self,response):

#测试看看是否可以获取会话cookie
cookieJar = response.meta.setdefault('cookie_jar',CookieJar())
cookieJar.extract_cookies(response,response.request)
cookies_test = cookieJar._cookies
printcookies - test:,cookies_test

#通过身份验证会话的URL访问
url =http://site.org/?id=XXXX
request = Request(url = url,callback = self.get_pict)
return [request]

如下面的输出显示,确实有cookie,但是我无法用cookieJar捕获它们:

  cookies  -  test:{} 
2012-01-02 22:44:39-0800 [myspider] DEBUG:正在将Cookie传送到:< GET http://www.facebook.com/profile.php?id=529907453>
Cookie:xxx = 3 ..........; yyy = 34 .............; zzz = .................; uuu = 44 ..........

包含键xxx,yyy等以及相应的值。



感谢:)

解决方案

服务器,它在成功登录后提供一个新的会话ID。这个新的会话ID应该与另一个请求一起使用。



这里是从源代码中找到的代码,似乎对我有用。

 打印来自登录的cookie,response.headers.getlist('Set-Cookie')[0] .split(;)[0] .split =)[1] 

代码:

  def check_logged(self,response):
tmpCookie = response.headers.getlist('Set-Cookie')[0] .split(;)[0 ] .split(;)[0] .split(;)[1]
打印来自登录的cookie,response.headers.getlist =)[1]
cookieHolder = dict(SESSION_ID = tmpCookie)

#print response.body
如果my name在response.body中:
yield Request(url =< cookies = cookieHolder,
callback = self。<<< another function here> b $ b else:
printlogin failed
return


I am trying to access the session cookie within a spider. I first login to a social network using in a spider:

    def parse(self, response):

        return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email': '...', 'pass':'...'},
                callback=self.after_login)]

In after_login, I would like to access the session cookies, in order to pass them to another module (selenium here) to further process the page with an authentificated session.

I would like something like that:

     def after_login(self, response):

        # process response
        .....

        # access the cookies of that session to access another URL in the
        # same domain with the autehnticated session.
        # Something like:
        session_cookies = XXX.get_session_cookies()
        data = another_function(url,cookies)

Unfortunately, response.cookies does not return the session cookies.

How can I get the session cookies ? I was looking at the cookies middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies but there doesn't seem to be any straightforward way to access the session cookies.

Some more details here bout my original question:

Unfortunately, I used your idea but I dind't see the cookies, although I know for sure that they exists since the scrapy.contrib.downloadermiddleware.cookies middleware does print out the cookies! These are exactly the cookies that I want to grab.

So here is what I am doing:

The after_login(self,response) method receives the response variable after proper authentication, and then I access an URL with the session data:

  def after_login(self, response):

        # testing to see if I can get the session cookies
        cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
        cookieJar.extract_cookies(response, response.request)
        cookies_test = cookieJar._cookies
        print "cookies - test:",cookies_test

        # URL access with authenticated session
        url = "http://site.org/?id=XXXX"     
        request = Request(url=url,callback=self.get_pict)   
        return [request] 

As the output below shows, there are indeed cookies, but I fail to capture them with cookieJar:

cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
    Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........

So I would like to get a dictionary containing the keys xxx, yyy etc with the corresponding values.

Thanks :)

解决方案

A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.

Here is the code picked up from source which seems to work for me.

print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]

Code:

def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)

#print response.body
if "my name" in response.body:
    yield Request(url="<<new url for another server>>",   
        cookies=cookieHolder,
        callback=self."<<another function here>>")
else:
    print "login failed"
        return 

这篇关于访问会话cookie在scrapy蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆