使用 python 请求会话登录到 LinkedIn [英] Logging in to LinkedIn with python requests sessions
问题描述
我正在尝试使用 Python 请求登录 LinkedIn:
I'm trying to log into LinkedIn using Python requests:
import sys
import requests
from BeautifulSoup import BeautifulSoup
payload={
'session-key' : 'user@email.com',
'session-password' : 'password'
}
URL='https://www.linkedin.com/uas/login-submit'
s=requests.session()
s.post(URL,data=payload)
r=s.get('http://www.linkedin.com/nhome')
soup = BeautifulSoup(r.text)
print soup.find('title')
我似乎无法使用此方法登录.我什至尝试在有效负载中使用 csrf 等,但会话不应该为您处理这些吗?
I can't seem to log in using this method. I even tried playing with csrf etc. in the payload, but aren't sessions supposed to take care of that for you?
关于最后一行的注意事项:我使用标题来检查我是否已成功登录.(如果我已登录,我应该看到欢迎!| LinkedIn",而不是世界上最大的专业网络| LinkedIn"
Note about the last line: I use the title to check if I've successfully logged in. (I should see "Welcome! | LinkedIn" if I have signed in, instead I see "World's Largest Professional Network | LinkedIn"
我错过了什么吗?
推荐答案
我修改了一个 Web 抓取模板,用于满足我的大部分基于 Python 的抓取需求,以满足您的需求.验证它与我自己的登录信息一起使用.
I modified a web-scraping template I use for most of my Python-based scraping needs to fit your needs. Verified it worked with my own login info.
它的工作方式是模仿浏览器并维护一个 cookieJar 来存储您的用户会话.让它也可以与 BeautifulSoup 配合使用.
The way it works is by mimic-ing a browser and maintaining a cookieJar that stores your user session. Got it to work with BeautifulSoup for you as well.
注意:这是一个 Python2 版本.我应要求在下面进一步添加了一个有效的 Python3 示例.
Note: This is a Python2 version. I added a working Python3 example further below by request.
import cookielib
import os
import urllib
import urllib2
import re
import string
from BeautifulSoup import BeautifulSoup
username = "user@email.com"
password = "password"
cookie_filename = "parser.cookies.txt"
class LinkedInParser(object):
def __init__(self, login, password):
""" Start up... """
self.login = login
self.password = password
# Simulate browser with cookies enabled
self.cj = cookielib.MozillaCookieJar(cookie_filename)
if os.access(cookie_filename, os.F_OK):
self.cj.load()
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(self.cj)
)
self.opener.addheaders = [
('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
'Windows NT 5.2; .NET CLR 1.1.4322)'))
]
# Login
self.loginPage()
title = self.loadTitle()
print title
self.cj.save()
def loadPage(self, url, data=None):
"""
Utility function to load HTML from URLs for us with hack to continue despite 404
"""
# We'll print the url in case of infinite loop
# print "Loading URL: %s" % url
try:
if data is not None:
response = self.opener.open(url, data)
else:
response = self.opener.open(url)
return ''.join(response.readlines())
except:
# If URL doesn't load for ANY reason, try again...
# Quick and dirty solution for 404 returns because of network problems
# However, this could infinite loop if there's an actual problem
return self.loadPage(url, data)
def loginPage(self):
"""
Handle login. This should populate our cookie jar.
"""
html = self.loadPage("https://www.linkedin.com/")
soup = BeautifulSoup(html)
csrf = soup.find(id="loginCsrfParam-login")['value']
login_data = urllib.urlencode({
'session_key': self.login,
'session_password': self.password,
'loginCsrfParam': csrf,
})
html = self.loadPage("https://www.linkedin.com/uas/login-submit", login_data)
return
def loadTitle(self):
html = self.loadPage("https://www.linkedin.com/feed/")
soup = BeautifulSoup(html)
return soup.find("title")
parser = LinkedInParser(username, password)
2014 年 6 月 19 日更新:从主页添加了对 CSRF 令牌的解析,以便在更新的登录过程中使用.
Update June 19, 2014: Added parsing for CSRF token from homepage for use in updated login process.
2015 年 7 月 23 日更新: 在此处添加 Python 3 示例.基本上需要替换库位置并删除不推荐使用的方法.它不是完全格式化或任何东西,但它的功能.很抱歉匆忙工作.最终原理和步骤都是一样的.
Update July 23, 2015: Adding a Python 3 example here. Basically requires substituting library locations and removing deprecated methods. It's not perfectly formatted or anything, but it functions. Sorry for the rush job. In the end the principals and steps are identical.
import http.cookiejar as cookielib
import os
import urllib
import re
import string
from bs4 import BeautifulSoup
username = "user@email.com"
password = "password"
cookie_filename = "parser.cookies.txt"
class LinkedInParser(object):
def __init__(self, login, password):
""" Start up... """
self.login = login
self.password = password
# Simulate browser with cookies enabled
self.cj = cookielib.MozillaCookieJar(cookie_filename)
if os.access(cookie_filename, os.F_OK):
self.cj.load()
self.opener = urllib.request.build_opener(
urllib.request.HTTPRedirectHandler(),
urllib.request.HTTPHandler(debuglevel=0),
urllib.request.HTTPSHandler(debuglevel=0),
urllib.request.HTTPCookieProcessor(self.cj)
)
self.opener.addheaders = [
('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
'Windows NT 5.2; .NET CLR 1.1.4322)'))
]
# Login
self.loginPage()
title = self.loadTitle()
print(title)
self.cj.save()
def loadPage(self, url, data=None):
"""
Utility function to load HTML from URLs for us with hack to continue despite 404
"""
# We'll print the url in case of infinite loop
# print "Loading URL: %s" % url
try:
if data is not None:
response = self.opener.open(url, data)
else:
response = self.opener.open(url)
return ''.join([str(l) for l in response.readlines()])
except Exception as e:
# If URL doesn't load for ANY reason, try again...
# Quick and dirty solution for 404 returns because of network problems
# However, this could infinite loop if there's an actual problem
return self.loadPage(url, data)
def loadSoup(self, url, data=None):
"""
Combine loading of URL, HTML, and parsing with BeautifulSoup
"""
html = self.loadPage(url, data)
soup = BeautifulSoup(html, "html5lib")
return soup
def loginPage(self):
"""
Handle login. This should populate our cookie jar.
"""
soup = self.loadSoup("https://www.linkedin.com/")
csrf = soup.find(id="loginCsrfParam-login")['value']
login_data = urllib.parse.urlencode({
'session_key': self.login,
'session_password': self.password,
'loginCsrfParam': csrf,
}).encode('utf8')
self.loadPage("https://www.linkedin.com/uas/login-submit", login_data)
return
def loadTitle(self):
soup = self.loadSoup("https://www.linkedin.com/feed/")
return soup.find("title")
parser = LinkedInParser(username, password)
这篇关于使用 python 请求会话登录到 LinkedIn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!