如何使用python-request来获取linkedin页面? [英] How could I use python-request to grab a linkedin page?
本文介绍了如何使用python-request来获取linkedin页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用下面的代码尝试获取链接的页面,但是似乎此方法无法让我登录,只是向我显示未经授权的主页.
I use below code try to grab a linked in page,but it seems this method couldn't let me login,just show me the unauthorized home page.
#/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
payload = {
'session-key': 'my account',
'session-password': 'my password'
}
URL = 'https://www.linkedin.com/uas/login'
s = requests.session()
s.post(URL, data=payload)
r = s.get('http://www.linkedin.com/nhome')
soup = BeautifulSoup(r.text)
print(soup)
`
推荐答案
这比到目前为止要复杂得多.
This is much more complicated than what you've got so far.
您将需要执行以下操作:
You will need to do something like:
- 加载 https://www.linkedin.com/uas/login
- 使用
BeautifulSoup
解析响应以获取登录表单以及所有隐藏的表单字段等(CSRF尤为重要,因为服务器将拒绝没有正确值的POST请求). - 从已解析的登录表单数据+您的用户名和密码来构建POST数据字典
- 将该数据发布到 https://www.linkedin.com/uas/login-submit (您可能还必须伪造一些标头,因为它可能只接受标记为AJAX的请求)
- 最后获取 http://www.linkedin.com/nhome
- Load https://www.linkedin.com/uas/login
- Parse the response with
BeautifulSoup
to get the login form, with all the hidden form fields etc. (The CSRF ones are particularly important, as the server will reject a POST request without the correct values). - Build your POST data dictionary from the parsed login form data + your username and password
- POST that data to https://www.linkedin.com/uas/login-submit (you might have to fake some of the headers too, as it might only accept requests marked as AJAX)
- Finally GET http://www.linkedin.com/nhome
您可以通过在chrome/firefox中打开开发人员工具并在网络"标签中执行登录过程来查看整个过程.
You can see this whole process by opening the developer tools in chrome/firefox and going through the login process in the network tab.
类似的事情应该起作用:
Something like this should work:
import requests
from bs4 import BeautifulSoup
# Get login form
URL = 'https://www.linkedin.com/uas/login'
session = requests.session()
login_response = session.get('https://www.linkedin.com/uas/login')
login = BeautifulSoup(login_response.text)
# Get hidden form inputs
inputs = login.find('form', {'name': 'login'}).findAll('input', {'type': ['hidden', 'submit']})
# Create POST data
post = {input.get('name'): input.get('value') for input in inputs}
post['session_key'] = 'username'
post['session_password'] = 'password'
# Post login
post_response = session.post('https://www.linkedin.com/uas/login-submit', data=post)
# Get home page
home_response = session.get('http://www.linkedin.com/nhome')
home = BeautifulSoup(home_response.text)
这篇关于如何使用python-request来获取linkedin页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文