如何在不使用 selenium 等无头浏览器的情况下登录morningstar.com? [英] How can I log in to morningstar.com without using a headless browser such as selenium?
问题描述
我阅读了问题的答案:如何使用 Python 的请求模块登录"到网站?
I read the answer to the question: "How to "log in" to a website using Python's Requests module?"
答案如下:首先查看登录表单的来源,得到三个信息——表单发布到的url,以及用户名和密码字段的名称属性."
The answer reads: "Firstly check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields."
我怎么能看到这个morningstar.com页面的用户名和密码的名称属性是什么?https://www.morningstar.com/members/login.html
How can I see, what the name attributes for username and password are for this morningstar.com page? https://www.morningstar.com/members/login.html
我有以下代码:
import requests
url = 'http://www.morningstar.com/members/login.html'
url = 'http://beta.morningstar.com'
with open('morningstar.txt') as f:
username, password = f.read().splitlines()
with requests.Session() as s:
payload = login_data = {
'username': username,
'password': password,
}
p = s.post(url, data=login_data)
print(p.text)
但是 - 除其他外 - 它会打印:
But - among other things - it prints:
This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.
url
和 data
应该用于 post
什么?
What should url
and data
be for the post
?
还有另一个答案,它使用了selenium
,但是否有可能避免这种情况?
There is another answer, which makes use of selenium
, but is it possible to avoid that?
推荐答案
这有点难,我不得不使用拦截代理,但这里是:
This was kind of hard, i had to use an intercepting proxy, but here it is:
import requests
s = requests.session()
auth_url = 'https://sso.morningstar.com/sso/json/msusers/authenticate'
login_url = 'https://www.morningstar.com/api/v2/user/login'
username = 'username'
password = 'password'
headers = {
'Access-Control-Request-Method': 'POST',
'Access-Control-Request-Headers': 'content-type,x-openam-password,x-openam-username',
'Origin': 'https://www.morningstar.com'
}
s.options(auth_url, headers=headers)
headers = {
'Referer': 'https://www.morningstar.com/members/login.html',
'Content-Type': 'application/json',
'X-OpenAM-Username': username,
'X-OpenAM-Password': password,
'Origin': 'https://www.morningstar.com',
}
s.post(auth_url, headers=headers)
data = {"productCode":"DOT_COM","rememberMe":False}
r = s.post(login_url, json=data)
print(s.cookies)
print(r.json())
现在您应该有一个经过身份验证的会话.您应该会在 s.cookies
中看到一堆 cookie,并在 r.json()
中看到有关您帐户的一些基本信息.
By now you should have an authenticated session. You should see a bunch of cookies in s.cookies
and some basic info about your account in r.json()
.
该站点更改了登录机制(可能还更改了他们的整个 CMS),因此上述代码不再起作用.新的登录过程包括对 /umapi/v1/sessions
的一个 POST 和一个 PATCH 请求,然后是对 /umapi/v1/users
的 GET 请求.
The site changed the login mechanism (and probably their entire CMS), so the above code doesn't work any more. The new login process involves one POST and one PATCH request to /umapi/v1/sessions
, then a GET request to /umapi/v1/users
.
import requests
sessions_url = 'https://www.morningstar.com/umapi/v1/sessions'
users_url = 'https://www.morningstar.com/umapi/v1/users'
userName = 'my email'
password = 'my pwd'
data = {'userName':userName,'password':password}
with requests.session() as s:
r = s.post(sessions_url, json=data)
# The response should be 200 if creds are valid, 401 if not
assert r.status_code == 200
s.patch(sessions_url)
r = s.get(users_url)
#print(r.json()) # contains account details
可以从网络浏览器的开发者控制台 (Ctrl+Shift+I) 的网络"选项卡下获取 URL 和其他必需值,例如 POST 数据.
The URLs and other required values, such as POST data, can be obtained from the developer console (Ctrl+Shift+I) of a web-browser, under the Network tab.
这篇关于如何在不使用 selenium 等无头浏览器的情况下登录morningstar.com?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!