麻烦的请求/美丽的汤 [英] Trouble with requests/Beautiful soup
问题描述
我想学习使用Python的SOM网络的特点,我想我会写一个脚本在我的大学登录到网页练习。起初,我写了使用的urllib2
的code,但用户alecxe好心使用请求给我提供了一个code / BeautifulSoup
(请参见:<一href=\"http://stackoverflow.com/questions/35279961/website-form-login-using-python-urllib2/35280124?noredirect=1#comment58303224_35280124\">Website使用Python的urllib2 )的形式登陆
I'm trying to learn to use som web features of Python, and thought I'd practice by writing a script to login to a webpage at my university. Initially I wrote the code using urllib2
, but user alecxe kindly provided me with a code using requests/BeautifulSoup
(please see:Website form login using Python urllib2)
我试图登录到网页。 http://reg.maths.lth.se/。页面功能,让学生一个登录表单和一个教师(我显然试图登录作为学生)。登录应该提供一个Personnummer这基本上是一个社会安全号码的相等的,所以我不希望我的张贴有效的数字。不过,我可以透露,它应该是10位。
I am trying to login to the page http://reg.maths.lth.se/. The page features one login form for students and one for teachers (I am obviously trying to log in as a student). To login one should provide a "Personnummer" which is basically the equivalent of a social security number, so I don't want to post my valid number. However, I can reveal that it should be 10 digits long.
我提供了(有一个小的变化最终打印语句)的code下面给出:
The code I was provided (with a small change to the final print statement) is given below:
import requests
from bs4 import BeautifulSoup
PNR = "00000000"
url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
# navigate to the main page again (should be logged in)
#response = session.get(url) ##This is deliberately commented out
soup = BeautifulSoup(response.content, "html.parser")
print(soup)
据因而应该打印张贴PNR后得到的页面的源代码code。
It is thus supposed to print the source code of the page obtained after POSTing the pnr.
虽然code运行时,它总是返回主页 HTTP: //reg.maths.lth.se/ 这是不正确的。例如,如果您尝试手动输入错误的长度的PNR,即0,您应该被引导到一个页面看起来像这样:
While the code runs, it always returns the source code of the main page http://reg.maths.lth.se/ which is not correct. For example, if you try to manually enter a pnr of the wrong length, i.e. 0, you should be directed to a page which looks like this:
位于网址 http://reg.maths.lth.se/login/student
的源$ C $ c是主网页的obiously不同。
located at the url http://reg.maths.lth.se/login/student
whose source code is obiously different from that of the main page.
有什么建议?
推荐答案
您不分配POST结果响应
和刚打印出的结果第一个GET请求。
You aren't assigning the POST result to response
, and are just printing out the result of the first GET request.
因此,
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
应
response = session.post(login_url, data={
"_token": token,
"pnr": PNR
})
这篇关于麻烦的请求/美丽的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!