使用python请求登录网站 [英] Login to website using python requests

查看:576
本文介绍了使用python请求登录网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用请求登录 https://www.voxbeam.com/login 抓取数据.我是python的初学者,我大部分时候都做过教程,并使用BeautifulSoup自己进行了一些网页抓取工作.

查看HTML:

<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">

<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">

<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">

<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">

<input id="hash" name="hash" type="hidden">

我知道我应该使用 post 方法,并发送 userName password

我正在尝试:

import requests
import webbrowser

url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
         'password': 'yyyyyyyyy'}

print("Original URL:", url)

r = requests.post(url, data=login)

print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)

print("\nRedirection:")
for i in r.history:
    print(i.status_code, i.url)

# Open r in the browser to check if I logged in
new = 2  # open in a new tab, if possible
webbrowser.open(r.url, new=new)

我希望成功登录后可以进入仪表板的URL,以便开始抓取所需的数据.

当我使用身份验证信息代替xxxxxx和yyyyyy运行代码时,得到以下输出:

Original URL: https://www.voxbeam.com/login

New URL https://www.voxbeam.com/login
Status Code: 200
History: []

Redirection:

Process finished with exit code 0

我在浏览器中通过www.voxbeam.com/login

获得了一个新标签

代码中有什么问题吗? 我在HTML中缺少什么吗? 可以期望在r中获得仪表板URL,或者可以重定向并尝试在浏览器选项卡中打开URL以直观地查看响应,还是应该以其他方式进行操作?

几天来,我在这里阅读了许多类似的问题,但是似乎每个网站的身份验证过程都有些不同,因此我检查了解决方案

如上所述,您应该发送表单所有字段的值.这些可以在浏览器的Web检查器中找到.此表单发送2个附加的隐藏值:

url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}  
# note that in email have encoded '@' like uuuuuuu%40gmail.com      

session = requests.Session()
r = session.post(url, headers=headers, data=data)

此外,许多站点都具有针对机器人的保护,例如隐藏表单字段,js,发送编码值等.作为变体,您可以:

1)使用手动登录的cookie:

url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}

s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)

2)使用模块硒:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)

u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)

I'm trying to login to https://www.voxbeam.com/login using requests to scrape data. I'm a python beginner and I have done mostly tutorials, and some web scraping on my own with BeautifulSoup.

Looking at the HTML:

<form id="loginForm" action="https://www.voxbeam.com//login" method="post" autocomplete="off">

<input name="userName" id="userName" class="text auto_focus" placeholder="Username" autocomplete="off" type="text">

<input name="password" id="password" class="password" placeholder="Password" autocomplete="off" type="password">

<input id="challenge" name="challenge" value="78ed64f09c5bcf53ead08d967482bfac" type="hidden">

<input id="hash" name="hash" type="hidden">

I understand I should be using the method post, and sending userName and password

I'm trying this:

import requests
import webbrowser

url = "https://www.voxbeam.com/login"
login = {'userName': 'xxxxxxxxx',
         'password': 'yyyyyyyyy'}

print("Original URL:", url)

r = requests.post(url, data=login)

print("\nNew URL", r.url)
print("Status Code:", r.status_code)
print("History:", r.history)

print("\nRedirection:")
for i in r.history:
    print(i.status_code, i.url)

# Open r in the browser to check if I logged in
new = 2  # open in a new tab, if possible
webbrowser.open(r.url, new=new)

I’m expecting, after a successful login to get in r the URL to the dashboard, so I can begin scraping the data I need.

When I run the code with the authentication information in place of xxxxxx and yyyyyy, I get the following output:

Original URL: https://www.voxbeam.com/login

New URL https://www.voxbeam.com/login
Status Code: 200
History: []

Redirection:

Process finished with exit code 0

I get in the browser a new tab with www.voxbeam.com/login

Is there something wrong in the code? Am I missing something in the HTML? It’s ok to expect to get the dashboard URL in r, or to be redirected and trying to open the URL in a browser tab to check visually the response, or I should be doing things in a different way?

I been reading many similar questions here for a couple of days, but it seems every website authentication process is a little bit different, and I checked http://docs.python-requests.org/en/latest/user/authentication/ which describes other methods, but I haven’t found anything in the HTML that would suggest I should be using one of those instead of post

I tried too

r = requests.get(url, auth=('xxxxxxxx', 'yyyyyyyy')) 

but it doesn’t seem to work either.

解决方案

As said above, you should send values of all fields of form. Those can be find in the Web inspector of browser. This form send 2 addition hidden values:

url = "https://www.voxbeam.com//login"
data = {'userName':'xxxxxxxxx','password':'yyyyyyyyy','challenge':'zzzzzzzzz','hash':''}  
# note that in email have encoded '@' like uuuuuuu%40gmail.com      

session = requests.Session()
r = session.post(url, headers=headers, data=data)

Also, many sites have protection from a bot like hidden form fields, js, send encoded values, etc. As variants you could:

1) Use a cookies from manual login:

url = "https://www.voxbeam.com"
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36"}
cookies = {'PHPSESSID':'zzzzzzzzzzzzzzz', 'loggedIn':'yes'}

s = requests.Session()
r = s.post(url, headers=headers, cookies=cookies)

2) Use module Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.voxbeam.com//login"
driver = webdriver.Firefox()
driver.get(url)

u = driver.find_element_by_name('userName')
u.send_keys('xxxxxxxxx')
p = driver.find_element_by_name('password')
p.send_keys('yyyyyyyyy')
p.send_keys(Keys.RETURN)

这篇关于使用python请求登录网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆