如何使用urllib登录网站? [英] How to log in to a website with urllib?

查看：108 发布时间：2020/5/3 9:26:11 forms python-3.x login urllib

本文介绍了如何使用urllib登录网站?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试登录以下网站: http://www.broadinstitute.org/cmap/index.jsp .我在Windows上使用python 3.3.我遵循了这个答案 https://stackoverflow.com/a/2910487/651779 .我的代码:

I am trying to log on this website: http://www.broadinstitute.org/cmap/index.jsp. I am using python 3.3 on Windows. I followed this answer https://stackoverflow.com/a/2910487/651779. My code:

import http.cookiejar
import urllib

url = 'http://www.broadinstitute.org/cmap/index.jsp'
values = {'j_username' : 'username',
          'j_password' : 'password'}

data = urllib.parse.urlencode(values)
binary_data = data.encode('ascii')
cookies = http.cookiejar.CookieJar()

opener = urllib.request.build_opener(
    urllib.request.HTTPRedirectHandler(),
    urllib.request.HTTPHandler(debuglevel=0),
    urllib.request.HTTPSHandler(debuglevel=0),
    urllib.request.HTTPCookieProcessor(cookies))

response = opener.open(url, binary_data)
the_page = response.read()
http_headers = response.info()

它运行时没有错误，但是the_page中的html只是登录页面.如何登录此页面?

It runs without erros, however the html in the_page is just the log in page. How can I log onto this page?

推荐答案

该站点使用JSESSIONID cookie创建会话，因为HTTP请求是无状态的.发出请求时，您不会先获得该会话ID.

The site is using a JSESSIONID cookie to create session since HTTP requests are stateless. When you're making your request, you're not getting that session id first.

我嗅到一个会话，使用Fiddler登录该站点，发现POST是针对另一个URL进行的，但是它设置了JSESSIONID cookie.因此，您需要先获取URL，使用cookiehandler捕获该cookie，然后发布到该URL:

I sniffed a session to log into that site using Fiddler and found that the POST is made to a different URL, but it has that JSESSIONID cookie set. So you need to make a get to the URL first, capture that cookie using the cookiehandler, then POST to this URL:

post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

您根本不需要保存HTTP GET请求，只需调用opener.open(url)，然后在代码中将响应行更改为此:

You don't need to save the HTTP GET request at all, you can simply call opener.open(url), then in your code change the response line to this:

response = opener.open(post_url, binary_data)

有效负载也缺少Submit方法.这就是我建议的更改的全部内容:

Also the payload was missing the submit method. Here's the whole thing with the changes I suggest:

import http.cookiejar
import urllib

get_url = 'http://www.broadinstitute.org/cmap/index.jsp'
post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

values = urllib.parse.urlencode({'j_username': <MYCOOLUSERNAME>,
          'j_password': <MYCOOLPASSSWORD>,
          'submit': 'sign in'})
payload = bytes(values, 'ascii')
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(
    urllib.request.HTTPRedirectHandler(),
    urllib.request.HTTPHandler(debuglevel=0),
    urllib.request.HTTPSHandler(debuglevel=0),
    urllib.request.HTTPCookieProcessor(cj))

opener.open(get_url) #First call to capture the JSESSIONID
resp = opener.open(post_url, payload)
resp_html = resp.read()
resp_headers = resp.info()

使用您创建的打开程序的任何其他请求都将重复使用该cookie，您应该可以自由浏览该网站.

Any other requests using the opener you created will re-use that cookie and you should be able to freely navigate the site.

这篇关于如何使用urllib登录网站?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用urllib登录网站? [英] How to log in to a website with urllib?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用urllib登录网站? [英] How to log in to a website with urllib?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭