在 Python 请求中使用 cookies.txt 文件 [英] Using cookies.txt file with Python Requests

查看:22
本文介绍了在 Python 请求中使用 cookies.txt 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用带有 Python 请求的 cookies.txt 文件(使用 Chrome 扩展程序生成)访问经过身份验证的站点:

I'm trying to access an authenticated site using a cookies.txt file (generated with a Chrome extension) with Python Requests:

import requests, cookielib

cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load()
r = requests.get(url, cookies=cj)

它不会抛出任何错误或异常,但会错误地生成登录屏幕.但是,我知道我的 cookie 文件是有效的,因为我可以通过 wget 使用它成功检索我的内容.知道我做错了什么吗?

It doesn't throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with wget. Any idea what I'm doing wrong?

我正在跟踪 cookielib.MozillaCookieJar._really_load 并且可以验证 cookie 是否被正确解析(即它们具有正确的 domainpath 值secure 等令牌).但由于交易仍在生成登录表单,因此 wget 似乎必须做一些额外的事情(因为完全相同的 cookies.txt 文件适用于它).

I'm tracing cookielib.MozillaCookieJar._really_load and can verify that the cookies are correctly parsed (i.e. they have the correct values for the domain, path, secure, etc. tokens). But as the transaction is still resulting in the login form, it seems that wget must be doing something additional (as the exact same cookies.txt file works for it).

推荐答案

MozillaCookieJar 继承自 FileCookieJar ,其构造函数中有以下文档字符串:

MozillaCookieJar inherits from FileCookieJar which has the following docstring in its constructor:

Cookies are NOT loaded from the named file until either the .load() or
.revert() method is called.

然后你需要调用 .load() 方法.

You need to call .load() method then.

此外,就像 Jermaine Xu 指出文件的第一行需要包含 # Netscape HTTP Cookie File# HTTP Cookie File 字符串.您使用的插件生成的文件不包含这样的字符串,因此您必须自己插入.我在 http://code.google 上提出了适当的错误.com/p/cookie-txt-export/issues/detail?id=5

Also, like Jermaine Xu noted the first line of the file needs to contain either # Netscape HTTP Cookie File or # HTTP Cookie File string. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5

编辑

会话 cookie 在第 5 列中保存为 0.如果您不将 ignore_expires=True 传递给 load() 方法,则从文件加载时所有此类 cookie 都会被丢弃.

Session cookies are saved with 0 in the 5th column. If you don't pass ignore_expires=True to load() method all such cookies are discarded when loading from a file.

文件session_cookie.txt:

# Netscape HTTP Cookie File
.domain.com TRUE    /   FALSE   0   name    value

Python 脚本:

import cookielib

cj = cookielib.MozillaCookieJar('session_cookie.txt')
cj.load()
print len(cj)

输出:0

编辑 2

虽然我们设法将 cookie 放入上面的 jar 中,但它们随后 cookielib 丢弃,因为它们在 expires 属性中仍有 0 值.为了防止这种情况,我们必须设置过期时间到未来的某个时间,如下所示:

Although we managed to get cookies into the jar above they are subsequently discarded by cookielib because they still have 0 value in the expires attribute. To prevent this we have to set the expire time to some future time like so:

for cookie in cj:
    # set cookie expire date to 14 days from now
    cookie.expires = time.time() + 14 * 24 * 3600

编辑 3

我检查了 wget 和 curl 并且都使用 0 过期时间来表示会话 cookie,这意味着它是事实上的标准.但是,Python 的实现出于相同目的使用空字符串,因此问题中出现了问题.我认为 Python 在这方面的行为应该与 wget 和 curl 的行为一致,这就是我在 http 上提出错误的原因://bugs.python.org/issue17164
我会注意到在输入文件的第 5 列中用空字符串替换 0s 并将 ignore_discard=True 传递给 load() 是解决问题的替代方法(在这种情况下无需更改到期时间).

I checked both wget and curl and both use 0 expiry time to denote session cookies which means it's the de facto standard. However Python's implementation uses empty string for the same purpose hence the problem raised in the question. I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164
I'll note that replacing 0s with empty strings in the 5th column of the input file and passing ignore_discard=True to load() is the alternate way of solving the problem (no need to change expiry time in this case).

这篇关于在 Python 请求中使用 cookies.txt 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆