使用 Python urllib2 登录网站表单 [英] Website form login using Python urllib2

查看:25
本文介绍了使用 Python urllib2 登录网站表单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试学习在 Python 中使用 urllib2 包.我尝试以学生身份(左侧表单)登录到数学学生的注册页面:

您缺少需要从页面的 HTML 源中提取的 _token 参数.它是一个隐藏的 input 元素:

我建议研究诸如MechanizeMechanicalSoupRoboBrowser 这将简化表单提交.您也可以使用 HTML 解析器解析 HTML,例如 BeautifulSoup 自己,提取令牌并通过 urllib2requests:

导入请求从 bs4 导入 BeautifulSoupPNR = "00000000"url = "http://reg.maths.lth.se/"login_url = "http://reg.maths.lth.se/login/student"使用 requests.Session() 作为会话:# 提取令牌响应 = session.get(url)汤 = BeautifulSoup(response.content, "html.parser")token = soup.find("input", {"name": "_token"})["value"]# 提交表格session.post(登录网址,数据={_token":令牌,pnr":PNR})# 再次导航到主页面(应该是登录的)响应 = session.get(url)汤 = BeautifulSoup(response.content, "html.parser")打印(汤.标题)

I've breen trying to learn to use the urllib2 package in Python. I tried to login in as a student (the left form) to a signup page for maths students: http://reg.maths.lth.se/. I have inspected the code (using Firebug) and the left form should obviously be called using POST with a key called pnr whose value should be a string 10 characters long (the last part can perhaps not be seen from the HTML code, but it is basically my social security number so I know how long it should be). Note that the action in the header for the appropriate POST method is another URL, namely http://reg.maths.lth.se/login/student.

I tried (with a fake pnr in the example below, but I used my real number in my own code).

import urllib
import urllib2

url = 'http://reg.maths.lth.se/'
values = dict(pnr='0000000000')
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
resp = urllib2.urlopen(req)
page = resp.read()

print page

While this executes, the print is the source code of the original page http://reg.maths.lth.se/, so it doesn't seem like I logged in. Also, I could add any key/value pairs to the values dictionary and it doesn't produce any error, which seems strange to me.

Also, if I go to the page http://reg.maths.lth.se/login/student, there is clearly no POST method for submitting data.

Any suggestions?

解决方案

If you would inspect what request is sent to the server when you enter the number and submit the form, you would notice that it is a POST request with pnr and _token parameters:

You are missing the _token parameter which you need to extract from the HTML source of the page. It is a hidden input element:

<input name="_token" type="hidden" value="WRbJ5x05vvDlzMgzQydFxkUfcFSjSLDhknMHtU6m">

I suggest looking into tools like Mechanize, MechanicalSoup or RoboBrowser that would ease the form submission. You may also parse the HTML with an HTML parser, like BeautifulSoup yourself, extract the token and send via urllib2 or requests:

import requests
from bs4 import BeautifulSoup

PNR = "00000000"

url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
    # extract token
    response = session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    token = soup.find("input", {"name": "_token"})["value"]

    # submit form
    session.post(login_url, data={
        "_token": token,
        "pnr": PNR
    })

    # navigate to the main page again (should be logged in)
    response = session.get(url)

    soup = BeautifulSoup(response.content, "html.parser")
    print(soup.title)

这篇关于使用 Python urllib2 登录网站表单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆