Python请求 - 管理Cookie [英] Python Requests - managing cookies

查看:198
本文介绍了Python请求 - 管理Cookie的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用请求(和bs4)从网站自动获取一些内容

I'm trying to get some content automatically from a site using requests (and bs4)

我有一个脚本获取cookie:

I have a script that gets a cookie:

def getCookies(self):
    username = 'username'
    password = 'password'
    URL = 'logonURL'
    r = requests.get(URL, auth=('username', 'password'))
    cookies = r.cookies

转储cookie的样式如下:

dump of the cookies looks like:

<<class 'requests.cookies.RequestsCookieJar'>[<Cookie ASP.NET_SessionId=yqokjr55ezarqbijyrwnov45 for URL.com/>, <Cookie BIGipServerPE_Journals.lww.com_80=1440336906.20480.0000 for URL.com/>, <Cookie JournalsLockCookie=id=a5720750-3f20-4207-a500-93ae4389213c&ip=IP address for URL.com/>]>

但是当我将cookie对象传递到下一个URL时:

But when I pass the cookie object to the next URL:

 soup = Soup(s.get(URL, cookies = cookies).content)

它无法正常工作 - 我可以通过倾销汤,看到我没有给网络服务器我的凭据正确

its not working out - I can see by dumping the soup that I'm not giving the webserver my credentials properly

我尝试运行请求会话:

def getCookies(self):
    self.s = requests.session()
    username = 'username'
    password = 'password'
    URL = 'logURL'
    r = self.s.get(URL, auth=('username', 'password'))

我也没有喜悦。

当我访问第2页时,我通过FF中的liveHttp查看了标题,看到了一个非常不同的形式:

I looked at the header via liveHttp in FF when I visit the 2nd page, and see a very different form:

Cookie: WT_FPC=id=264b0aa85e0247eb4f11355304127862:lv=1355317068013:ss=1355314918680; UserInfo=Username=username; BIGipServerPE_Journals.lww.com_80=1423559690.20480.0000; PlatformAuthCookie=true; Institution=ReferrerUrl=http://logonURL.com/?wa=wsignin1.0&wtrealm=urn:adis&wctx=http://URL.com/_layouts/Authenticate.aspx?Source=%252fpecnews%252ftoc%252f2012%252f06440&token=method|ExpireAbsolute; counterSessionGuidId=6e2bd57f-b6da-4dd4-bcb0-742428e08b5e; MyListsRefresh=12/13/2012 12:59:04 AM; ASP.NET_SessionId=40a04p45zppozc45wbadah45; JournalsLockCookie=id=85d1f38f-dcbb-476a-bc2e-92f7ac1ae493&ip=10.204.217.84; FedAuth=77u/PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48U2VjdXJpdHlDb250ZXh0VG9rZW4gcDE6SWQ9Il9mMGU5N2M3Zi1jNzQ5LTQ4ZjktYTUxNS1mODNlYjJiNGNlYzUtNEU1MDQzOEY0RTk5QURCNDFBQTA0Mjc0RDE5QzREMEEiIHhtbG5zOnAxPSJodHRwOi8vZG9jcy5vYXNpcy1vcGVuLm9yZy93c3MvMjAwNC8wMS9vYXNpcy0yMDA0MDEtd3NzLXdzc2VjdXJpdHktdXRpbGl0eS0xLjAueHNkIiB4bWxucz0iaHR0cDovL2RvY3Mub2FzaXMtb3Blbi5vcmcvd3Mtc3gvd3Mtc2VjdXJlY29udmVyc2F0aW9uLzIwMDUxMiI+PElkZW50aWZpZXI+dXJuOnV1aWQ6ZjJmNGY5MGItMmE4Yy00OTdlLTkwNzktY2EwYjM3MTBkN2I1PC9JZGVudGlmaWVyPjxJbnN0YW5jZT51cm46dXVpZDo2NzMxN2U5Ny1lMWQ3LTQ2YzUtOTg2OC05ZGJhYjA3NDkzOWY8L0luc3RhbmNlPjwvU2VjdXJpdHlDb250ZXh0VG9rZW4+



我从绝密的用户名,密码和网址问题明显的原因。

I have redacted the username, password, and URLS from the question for obvious reasons.

我缺少明显的东西吗?是有不同/正确的方式来捕获cookie - 我使用的当前方法是不工作。

Am I missing something obvious? is there a different / proper way to capture the cookie - the current method I'm using is not working.

编辑:

这是会话代码的自行版本:

This is a self standing version of the sessioned code:

s = requests.session()
username = 'username'
password = 'password'
URL = 'logonURL.aspx'
r = s.get(URL, auth=('username', 'password'))
URL = r"URL.aspx"
soup = Soup(s.get(URL).content)

读取汤的转储,我可以在html中看到它的告诉我没有访问权限 - 此字符串只有在您未登录时才通过浏览器显示。

reading a dump of the soup, I can see in the html that its telling me I don't have access - this string only appears via browser when you're not logged in.

推荐答案

类似的问题,并找到帮助在这个问题。会话jar是空的,并实际获取我需要使用会话的cookie。

I had a similar problem and found help in this question. The session jar was empty and to actually get the cookie I needed to use a session.

session = requests.session()
p = session.post("http://example.com", {'user':user,'password':password})
print 'headers', p.headers
print 'cookies', requests.utils.dict_from_cookiejar(session.cookies)
print 'html',  p.text

这篇关于Python请求 - 管理Cookie的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆