麻烦的请求/美丽的汤 [英] Trouble with requests/Beautiful soup

查看:163
本文介绍了麻烦的请求/美丽的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想学习使用Python的SOM网络的特点,我想我会写一个脚本在我的大学登录到网页练习。起初,我写了使用的urllib2 的code,但用户alecxe好心使用请求给我提供了一个code / BeautifulSoup (请参见:<一href=\"http://stackoverflow.com/questions/35279961/website-form-login-using-python-urllib2/35280124?noredirect=1#comment58303224_35280124\">Website使用Python的urllib2 )的形式登陆

I'm trying to learn to use som web features of Python, and thought I'd practice by writing a script to login to a webpage at my university. Initially I wrote the code using urllib2, but user alecxe kindly provided me with a code using requests/BeautifulSoup (please see:Website form login using Python urllib2)

我试图登录到网页。 http://reg.maths.lth.se/。页面功能,让学生一个登录表单和一个教师(我显然试图登录作为学生)。登录应该提供一个Personnummer这基本上是一个社会安全号码的相等的,所以我不希望我的张贴有效的数字。不过,我可以透露,它应该是10位。

I am trying to login to the page http://reg.maths.lth.se/. The page features one login form for students and one for teachers (I am obviously trying to log in as a student). To login one should provide a "Personnummer" which is basically the equivalent of a social security number, so I don't want to post my valid number. However, I can reveal that it should be 10 digits long.

我提供了(有一个小的变化最终打印语句)的code下面给出:

The code I was provided (with a small change to the final print statement) is given below:

import requests
from bs4 import BeautifulSoup

PNR = "00000000"

url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
    # extract token
    response = session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    token = soup.find("input", {"name": "_token"})["value"]

    # submit form
    session.post(login_url, data={
        "_token": token,
        "pnr": PNR
    })

    # navigate to the main page again (should be logged in)
    #response = session.get(url) ##This is deliberately commented out

    soup = BeautifulSoup(response.content, "html.parser")
    print(soup)

据因而应该打印张贴PNR后得到的页面的源代码code。

It is thus supposed to print the source code of the page obtained after POSTing the pnr.

虽然code运行时,它总是返回主页 HTTP: //reg.maths.lth.se/ 这是不正确的。例如,如果您尝试手动输入错误的长度的PNR,即0,您应该被引导到一个页面看起来像这样:

While the code runs, it always returns the source code of the main page http://reg.maths.lth.se/ which is not correct. For example, if you try to manually enter a pnr of the wrong length, i.e. 0, you should be directed to a page which looks like this:

在这里输入的形象描述
位于网址 http://reg.maths.lth.se/login/student 的源$ C ​​$ c是主网页的obiously不同。

located at the url http://reg.maths.lth.se/login/student whose source code is obiously different from that of the main page.

有什么建议?

推荐答案

您不分配POST结果响应和刚打印出的结果第一个GET请求。

You aren't assigning the POST result to response, and are just printing out the result of the first GET request.

因此​​,

# submit form
session.post(login_url, data={
    "_token": token,
    "pnr": PNR
})

response = session.post(login_url, data={
    "_token": token,
    "pnr": PNR
})

这篇关于麻烦的请求/美丽的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆