302s和使用urllib2丢失cookie [英] 302s and losing cookies with urllib2

查看:162
本文介绍了302s和使用urllib2丢失cookie的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将liburl2与CookieJar/HTTPCookieProcessor结合使用,以尝试模拟登录页面以自动执行上传.

I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload.

我已经看到了一些问题和答案,但是没有什么能解决我的问题.当我模拟登录最终以302重定向结束时,我丢失了cookie. 302响应是服务器设置cookie的位置,但是urllib2 HTTPCookieProcessor似乎在重定向期间未保存cookie.我尝试创建一个HTTPRedirectHandler类来忽略重定向,但这似乎并没有解决问题.我尝试全局引用CookieJar来处理来自HTTPRedirectHandler的cookie,但是1.这不起作用(因为我正在处理重定向器中的标头,而我正在使用的CookieJar函数extract_cookies需要完整的请求),并且2.处理它是一种丑陋的方式.

I've seen some questions and answers on this, but nothing which solves my problem. I am losing my cookie when I simulate the login which ends up at a 302 redirect. The 302 response is where the cookie gets set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during a redirect. I tried creating a HTTPRedirectHandler class to ignore the redirect, but that didn't seem to do the trick. I tried referencing the CookieJar globally to handle the cookies from the HTTPRedirectHandler, but 1. This didn't work (because I was handling the header from the redirector, and the CookieJar function that I was using, extract_cookies, needed a full request) and 2. It's an ugly way to handle it.

我可能需要一些指导,因为我对Python相当了解.我想我主要是在树上正确的树上,但可能专注于错误的分支.

I probably need some guidance on this as I'm fairly green with Python. I think I'm mostly barking up the right tree here, but maybe focusing on the wrong branch.

cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)


class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
  def http_error_302(self, req, fp, code, msg, headers):
    global cj
    cookie = headers.get("set-cookie")
    if cookie:
      # Doesn't work, but you get the idea
      cj.extract_cookies(headers, req)

    return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

  http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor(cj)

# Oh yeah.  I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)

另外:我也尝试过使用机械化,但没有成功.这可能是一个新问题,但由于它是最终目标,因此我将在这里提出:

Addition: I had tried using mechanize as well, without success. This is probably a new question, but I'll pose it here since it is the same ultimate goal:

此简单的机械化代码与302发射网址一起使用时(http://fxfeeds.mozilla.com/firefox/headlines.xml)-请注意,当不使用set_handle_robots(False)时,也会发生相同的行为.我只是想确保不是:

This simple code using mechanize, when used with a 302 emitting url (http://fxfeeds.mozilla.com/firefox/headlines.xml) -- note that the same behavior occurs when not using set_handle_robots(False). I just wanted to ensure that wasn't it:

import urllib2, mechanize

browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")

输出:

Traceback (most recent call last):
  File "redirecttester.py", line 6, in <module>
    r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'

有什么想法吗?

推荐答案

我最近遇到了完全相同的问题,但是为了节省时间,我决定使用

I have been having the exact same problem recently but in the interest of time scrapped it and decided to go with mechanize. It can be used as a total replacement for urllib2 that behaves exactly as you would expect a browser to behave with regards to Referer headers, redirects, and cookies.

import mechanize
cj = mechanize.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cj)
browser.set_proxies({'http': '127.0.0.1:8888'})

# Use browser's handlers to create a new opener
opener = mechanize.build_opener(*browser.handlers)

Browser对象本身可用作打开器(使用.open()方法).它在内部维护状态,但在每次调用时都返回一个响应对象.这样您将获得很大的灵活性.

The Browser object can be used as an opener itself (using the .open() method). It maintains state internally but also returns a response object on every call. So you get a lot of flexibility.

此外,如果不需要手动检查cookiejar或将其传递给其他对象,则也可以省略该对象的显式创建和分配.

Also, if you don't have a need to inspect the cookiejar manually or pass it along to something else, you can omit the explicit creation and assignment of that object as well.

我完全知道这不能解决真正的问题,以及为什么urllib2无法提供现成的解决方案,或者至少在没有大量调整的情况下提供解决方案,但是如果您的时间和时间很短,只是想让它工作,就使用机械化.

I am fully aware this doesn't address what is really going on and why urllib2 can't provide this solution out of the box or at least without a lot of tweaking, but if you're short on time and just want it to work, just use mechanize.

这篇关于302s和使用urllib2丢失cookie的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆