如何在不使用 selenium 等无头浏览器的情况下登录morningstar.com? [英] How can I log in to morningstar.com without using a headless browser such as selenium?

查看:23
本文介绍了如何在不使用 selenium 等无头浏览器的情况下登录morningstar.com?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了问题的答案:如何使用 Python 的请求模块登录"到网站?

I read the answer to the question: "How to "log in" to a website using Python's Requests module?"

答案如下:首先查看登录表单的来源,得到三个信息——表单发布到的url,以及用户名和密码字段的名称属性."

The answer reads: "Firstly check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields."

我怎么能看到这个morningstar.com页面的用户名和密码的名称属性是什么?https://www.morningstar.com/members/login.html

How can I see, what the name attributes for username and password are for this morningstar.com page? https://www.morningstar.com/members/login.html

我有以下代码:

import requests

url = 'http://www.morningstar.com/members/login.html'
url = 'http://beta.morningstar.com'

with open('morningstar.txt') as f:
    username, password = f.read().splitlines()

with requests.Session() as s:
    payload = login_data = {
        'username': username,
        'password': password,
        }
    p = s.post(url, data=login_data)
    print(p.text)

但是 - 除其他外 - 它会打印:

But - among other things - it prints:

This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.

urldata 应该用于 post 什么?

What should url and data be for the post?

还有另一个答案,它使用了selenium,但是否有可能避免这种情况?

There is another answer, which makes use of selenium, but is it possible to avoid that?

推荐答案

这有点难,我不得不使用拦截代理,但这里是:

This was kind of hard, i had to use an intercepting proxy, but here it is:

import requests

s = requests.session()
auth_url = 'https://sso.morningstar.com/sso/json/msusers/authenticate'
login_url = 'https://www.morningstar.com/api/v2/user/login'
username = 'username'
password = 'password'

headers = {
    'Access-Control-Request-Method': 'POST',
    'Access-Control-Request-Headers': 'content-type,x-openam-password,x-openam-username',
    'Origin': 'https://www.morningstar.com'
}
s.options(auth_url, headers=headers)

headers = {
    'Referer': 'https://www.morningstar.com/members/login.html',
    'Content-Type': 'application/json',
    'X-OpenAM-Username': username,
    'X-OpenAM-Password': password,
    'Origin': 'https://www.morningstar.com',
}
s.post(auth_url, headers=headers)

data = {"productCode":"DOT_COM","rememberMe":False}
r = s.post(login_url, json=data)

print(s.cookies)
print(r.json())

现在您应该有一个经过身份验证的会话.您应该会在 s.cookies 中看到一堆 cookie,并在 r.json() 中看到有关您帐户的一些基本信息.

By now you should have an authenticated session. You should see a bunch of cookies in s.cookies and some basic info about your account in r.json().

该站点更改了登录机制(可能还更改了他们的整个 CMS),因此上述代码不再起作用.新的登录过程包括对 /umapi/v1/sessions 的一个 POST 和一个 PATCH 请求,然后是对 /umapi/v1/users 的 GET 请求.

The site changed the login mechanism (and probably their entire CMS), so the above code doesn't work any more. The new login process involves one POST and one PATCH request to /umapi/v1/sessions, then a GET request to /umapi/v1/users.

import requests

sessions_url = 'https://www.morningstar.com/umapi/v1/sessions'
users_url = 'https://www.morningstar.com/umapi/v1/users'

userName = 'my email'
password = 'my pwd'
data = {'userName':userName,'password':password}

with requests.session() as s:
    r = s.post(sessions_url, json=data)
    # The response should be 200 if creds are valid, 401 if not
    assert r.status_code == 200
    s.patch(sessions_url)
    r = s.get(users_url)
    #print(r.json()) # contains account details

可以从网络浏览器的开发者控制台 (Ctrl+Shift+I) 的网络"选项卡下获取 URL 和其他必需值,例如 POST 数据.

The URLs and other required values, such as POST data, can be obtained from the developer console (Ctrl+Shift+I) of a web-browser, under the Network tab.

这篇关于如何在不使用 selenium 等无头浏览器的情况下登录morningstar.com?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆