为什么此Requests和BeautifulSoup登录脚本不起作用? [英] Why is this Requests and BeautifulSoup login script not working?

查看:62
本文介绍了为什么此Requests和BeautifulSoup登录脚本不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我在Python 3的 Requests BeautifulSoup 库的背面生成的一些代码.

The following is a bit of code i produced on the back of the Requests and BeautifulSoup libraries for Python 3.

import requests as rq
from bs4 import BeautifulSoup as bs

def get_data():
    return {'email': str(input('Enter your email.')),
        'password': str(input('Enter your password.'))}

def obtain_data():
    login_data=get_data()
    form_data={'csrf_token': login_data['email'],
               'login': '1',
               'redirect': 'account/dashboard',
               'query': None,
               'required': 'email,password',
               'email': login_data['email'],
               'password': login_data['password']}
    headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
    with rq.Session() as s:
        r=s.get('https://www.formstack.com/admin/user/login',headers=headers)
        form_data['csrf_token']+=','+bs(r.content, 'html.parser').find('input',attrs={'name':'csrf_token'})['value']
        r=s.post('https://www.formstack.com/admin/user/login',data=form_data,headers=headers)
        assert('Collect' in bs(r.content,'html.parser'))

obtain_data()

该代码的目的是使用由<获得的登录凭据,登录名为 Formstack 的调查平台.code> get_data().为此,我们使用请求创建会话.从这里,我们将获取请求发送到Formstack的登录页面.我们使用BeautifulSoup组装了对该请求的响应的解析树,并因此使用 name ='csrf_token'(同样,使用BeautifulSoup)来检索输入HTML标签的值,因为我们需要使用该值来填写我们的登录表格.在这里,我们组装一个在上面的代码中表示为 form_data 的表单,然后使用此表单提交发布请求,再次将其提交到Formstack的登录页面.该应该将我登录到我的Formstack帐户,但是由于某种原因,它没有.我已经通过登录后在看到的第一页内容上运行 assert()行来检查是否这样做,并且使用此方法,我总是会得到一个断言错误

The purpose of the code is to log into a survey platform called Formstack using the login credentials obtained by get_data(). To do this, we create a session using Requests. From here, we send a get request to Formstack's login page. We assemble a parse tree of the response to this request using BeautifulSoup, and hence retrieve the value of the input HTML tag with name='csrf_token' (again, using BeautifulSoup), since we need this value to complete our login form. From here, we assemble a form, denoted form_data in the code above, and submit a post request using this form, again to Formstack's login page. This should log me into my Formstack account, but for some reason, it doesn't. I've checked that it doesn't by running an assert() line on the contents of the first page I see after logging in, and, using this method, I've always gotten an assertion error.

我不太熟悉Python中的网络抓取,因此我不确定该问题出在哪里.我尝试过在get请求和post请求中都允许重定向,但是在两种情况下我都没有提到.感谢您的任何帮助,谢谢.

I'm not very well acquainted with web scraping in Python, and hence I'm not sure where to go with this problem. I've tried allowing redirects in both the get request and the post request, but I've come up short in both cases. Any help is appreciated, thank you.

推荐答案

您将必须模仿浏览器的行为,就像是从浏览器发出请求一样.有时也需要标头,但这里似乎并不需要它们(如果需要,您仍然可以添加它们).

You'll have to mimic the browser's behavior as if the request was made from a browser. Sometimes that required headers too but here they don't seem to be required(you can still add them if you want).

在浏览器的网络"选项卡上打开保留",我可以看到发出请求的顺序.重要的部分是在实际POST请求(发送您的凭据)之前发出的GET请求.在这里,顺序似乎很重要,因此可以保留下来.

Looking at the "Network" tab of my browser with "Preserve" on, i can see the order in which the requests are being made. The important part is a GET request that's being made before the actual POST request (which sends your credentials). The order seems to be important here so it is preserved.

注1:我在这里使用了正则表达式来提取令牌,但是如果需要在其他时间解析页面,则应该使用 BeautifulSoup .

Note 1: I've used regex here to extract the token, but you should use BeautifulSoup if you need to parse the page at some other time.

注2:页面似乎随着时间的推移而加载,JavaScript从动态后端接收数据.确保检查网络"选项卡,查看正在发出什么请求来满足您的需求,并在代码中发出相同的请求.

Note 2: The page seems to load over time with JavaScript receiving data from the dynamic back-end. Make sure to check the Network tab and see what requests are being made to get you what you need, and make those same requests in your code.

import requests
import re
email = "email"
password = "pwd"
with requests.Session() as sess:
    login_page = sess.get("https://www.formstack.com/admin/user/login")
    #extract token from page source, you can use any other method for this
    token = re.search(r'token="(.*?)"', login_page.text).group(1)
    #the exact format as seen in the "Networks" tab of the browser
    data = {
    "csrf_token": token,
    "login": "1",
    "redirect": "account/dashboard",
    "query": "",
    "required": "email,password",
    "email": email,
    "password": password
    }
    GET_url = "https://www.formstack.com/admin/platform-sso/determineSsoStatus/" + email
    POST_url = "https://www.formstack.com/admin/session/create"
    #keep the order of requests as seen in the browser's Network tab
    sess.get(GET_url)
    final = sess.post(POST_url, data=data)
    #check final redirect, and see if user is taken to the dashboard or redirected back to login
    #uncomment to see where the redirects end at:
    #print(final.url)
    if "redirect" in final.url:
        print("Incorrect creds")
    else:
        print("Success")

这篇关于为什么此Requests和BeautifulSoup登录脚本不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆