在scrapy蜘蛛中访问会话cookie [英] Access session cookie in scrapy spiders

查看:26
本文介绍了在scrapy蜘蛛中访问会话cookie的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试访问蜘蛛中的会话 cookie.我首先使用蜘蛛登录到社交网络:

I am trying to access the session cookie within a spider. I first login to a social network using in a spider:

    def parse(self, response):

        return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email': '...', 'pass':'...'},
                callback=self.after_login)]

after_login 中,我想访问会话 cookie,以便将它们传递给另一个模块(此处为 selenium)以使用经过身份验证的会话进一步处理页面.

In after_login, I would like to access the session cookies, in order to pass them to another module (selenium here) to further process the page with an authentificated session.

我想要这样的东西:

     def after_login(self, response):

        # process response
        .....

        # access the cookies of that session to access another URL in the
        # same domain with the autehnticated session.
        # Something like:
        session_cookies = XXX.get_session_cookies()
        data = another_function(url,cookies)

不幸的是,response.cookies 不返回会话 cookie.

Unfortunately, response.cookies does not return the session cookies.

如何获取会话 cookie?我正在查看 cookie 中间件:scrapy.contrib.downloadermiddleware.cookiesscrapy.http.cookies 但似乎没有任何直接访问会话的方法饼干.

How can I get the session cookies ? I was looking at the cookies middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies but there doesn't seem to be any straightforward way to access the session cookies.

这里有一些关于我原来问题的更多细节:

不幸的是,我使用了你的想法,但我没有看到 cookie,尽管我确信它们存在,因为 scrapy.contrib.downloadermiddleware.cookies 中间件确实打印了 cookie!这些正是我想要的饼干.

Unfortunately, I used your idea but I dind't see the cookies, although I know for sure that they exists since the scrapy.contrib.downloadermiddleware.cookies middleware does print out the cookies! These are exactly the cookies that I want to grab.

这就是我正在做的事情:

So here is what I am doing:

after_login(self,response) 方法在正确的身份验证后接收响应变量,然后我访问带有会话数据的 URL:

The after_login(self,response) method receives the response variable after proper authentication, and then I access an URL with the session data:

  def after_login(self, response):

        # testing to see if I can get the session cookies
        cookieJar = response.meta.setdefault('cookie_jar', CookieJar())
        cookieJar.extract_cookies(response, response.request)
        cookies_test = cookieJar._cookies
        print "cookies - test:",cookies_test

        # URL access with authenticated session
        url = "http://site.org/?id=XXXX"     
        request = Request(url=url,callback=self.get_pict)   
        return [request] 

正如下面的输出所示,确实有cookies,但我没有用cookieJar捕获它们:

As the output below shows, there are indeed cookies, but I fail to capture them with cookieJar:

cookies - test: {}
2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453>
    Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44..........

所以我想得到一个字典,其中包含 xxx、yyy 等键以及相应的值.

So I would like to get a dictionary containing the keys xxx, yyy etc with the corresponding values.

谢谢:)

推荐答案

一个典型的例子是有一个登录服务器,它在成功登录后提供一个新的会话 ID.这个新的会话 ID 应该与另一个请求一起使用.

A classic example is having a login server, which provides a new session id after a successful login. This new session id should be used with another request.

这是从源代码中提取的代码,它似乎对我有用.

Here is the code picked up from source which seems to work for me.

print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]

代码:

def check_logged(self, response):
tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1]
cookieHolder=dict(SESSION_ID=tmpCookie)

#print response.body
if "my name" in response.body:
    yield Request(url="<<new url for another server>>",   
        cookies=cookieHolder,
        callback=self."<<another function here>>")
else:
    print "login failed"
        return 

这篇关于在scrapy蜘蛛中访问会话cookie的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆