使用cookies.txt文件与Python请求 [英] Using cookies.txt file with Python Requests

查看:288
本文介绍了使用cookies.txt文件与Python请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用使用Python请求的 cookies.txt 文件(使用Chrome扩展程序生成)访问经过身份验证的网站:

 导入请求,cookielib 

cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load b $ br = requests.get(url,cookies = cj)

或异常,但产生登录屏幕,不正确。但是,我知道我的cookie文件是有效的,因为我可以使用 wget 成功检索我的内容。任何想法我做错了什么?



编辑



m跟踪 cookielib.MozillaCookieJar._really_load ,并且可以验证cookie是否正确解析(即它们具有 path secure 等令牌)。但是因为事务仍然导致登录形式,看起来 wget 必须做一些额外的(因为完全相同 cookies.txt MozillaCookieJar


> 解决方案

继承自其构造函数中具有以下docstring的 FileCookieJar

 直到调用.load()或
.revert()方法,才从指定的文件加载。

您需要调用 .load()方法。



此外,像Jermaine Xu注意到文件的第一行需要包含#Netscape HTTP Cookie File #HTTP Cookie File 字符串。您使用的插件生成的文件不包含这样的字符串,因此您必须自己插入。我在 http://code.google.com/p/cookie上提出了相应的错误-txt-export / issues / detail?id = 5



EDIT



会话Cookie在第5列中保存为0。如果您没有通过 ignore_expires = True load()方法从文件加载时, 。



文件 session_cookie.txt

 #Netscape HTTP Cookie文件
.domain.com TRUE / FALSE 0名称值


b $ b

Python脚本:

  import cookielib 

cj = cookielib.MozillaCookieJar session_cookie.txt')
cj.load()
print len(cj)

输出:
0



EDIT 2

虽然我们设法将cookie存储到jar中,然后它们 $ ,因为他们在中仍然有 0 > expires 属性。为了防止出现这种情况,我们必须将过期时间设置为某个未来时间,如下所示:

 对于cj中的cookie:
#set cookie到期日期为14天
cookie.expires = time.time()+ 14 * 24 * 3600

编辑3



我检查了wget和curl,并使用 0 到期时间来表示会话cookie,这意味着它是事实上的标准。但是Python的实现使用空字符串来实现相同的目的,因此问题中出现了问题。我认为Python在这方面的行为应该与wget和curl是一致的,这就是为什么我在 http://bugs.python .org / issue17164

我将注意到,在输入文件的第5列中用空字符串替换 0 ,并传递 ignore_discard = True 到 load()是解决问题的替代方法(在这种情况下不需要更改到期时间)。


I'm trying to access an authenticated site using a cookies.txt file (generated with a Chrome extension) with Python Requests:

import requests, cookielib

cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load()
r = requests.get(url, cookies=cj)

It doesn't throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with wget. Any idea what I'm doing wrong?

Edit:

I'm tracing cookielib.MozillaCookieJar._really_load and can verify that the cookies are correctly parsed (i.e. they have the correct values for the domain, path, secure, etc. tokens). But as the transaction is still resulting in the login form, it seems that wget must be doing something additional (as the exact same cookies.txt file works for it).

解决方案

MozillaCookieJar inherits from FileCookieJar which has the following docstring in its constructor:

Cookies are NOT loaded from the named file until either the .load() or
.revert() method is called.

You need to call .load() method then.

Also, like Jermaine Xu noted the first line of the file needs to contain either # Netscape HTTP Cookie File or # HTTP Cookie File string. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5

EDIT

Session cookies are saved with 0 in the 5th column. If you don't pass ignore_expires=True to load() method all such cookies are discarded when loading from a file.

File session_cookie.txt:

# Netscape HTTP Cookie File
.domain.com TRUE    /   FALSE   0   name    value

Python script:

import cookielib

cj = cookielib.MozillaCookieJar('session_cookie.txt')
cj.load()
print len(cj)

Output: 0

EDIT 2

Although we managed to get cookies into the jar above they are subsequently discarded by cookielib because they still have 0 value in the expires attribute. To prevent this we have to set the expire time to some future time like so:

for cookie in cj:
    # set cookie expire date to 14 days from now
    cookie.expires = time.time() + 14 * 24 * 3600

EDIT 3

I checked both wget and curl and both use 0 expiry time to denote session cookies which means it's the de facto standard. However Python's implementation uses empty string for the same purpose hence the problem raised in the question. I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164
I'll note that replacing 0s with empty strings in the 5th column of the input file and passing ignore_discard=True to load() is the alternate way of solving the problem (no need to change expiry time in this case).

这篇关于使用cookies.txt文件与Python请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆