使用 python 请求会话登录到 LinkedIn [英] Logging in to LinkedIn with python requests sessions

查看:22
本文介绍了使用 python 请求会话登录到 LinkedIn的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 请求登录 LinkedIn:

I'm trying to log into LinkedIn using Python requests:

import sys
import requests
from BeautifulSoup import BeautifulSoup


payload={
    'session-key' : 'user@email.com',
    'session-password' : 'password'
}

URL='https://www.linkedin.com/uas/login-submit'
s=requests.session()
s.post(URL,data=payload)

r=s.get('http://www.linkedin.com/nhome')
soup = BeautifulSoup(r.text)
print soup.find('title')

我似乎无法使用此方法登录.我什至尝试在有效负载中使用 csrf 等,但会话不应该为您处理这些吗?

I can't seem to log in using this method. I even tried playing with csrf etc. in the payload, but aren't sessions supposed to take care of that for you?

关于最后一行的注意事项:我使用标题来检查我是否已成功登录.(如果我已登录,我应该看到欢迎!| LinkedIn",而不是世界上最大的专业网络| LinkedIn"

Note about the last line: I use the title to check if I've successfully logged in. (I should see "Welcome! | LinkedIn" if I have signed in, instead I see "World's Largest Professional Network | LinkedIn"

我错过了什么吗?

推荐答案

我修改了一个 Web 抓取模板,用于满足我的大部分基于 Python 的抓取需求,以满足您的需求.验证它与我自己的登录信息一起使用.

I modified a web-scraping template I use for most of my Python-based scraping needs to fit your needs. Verified it worked with my own login info.

它的工作方式是模仿浏览器并维护一个 cookieJar 来存储您的用户会话.让它也可以与 BeautifulSoup 配合使用.

The way it works is by mimic-ing a browser and maintaining a cookieJar that stores your user session. Got it to work with BeautifulSoup for you as well.

注意:这是一个 Python2 版本.我应要求在下面进一步添加了一个有效的 Python3 示例.

Note: This is a Python2 version. I added a working Python3 example further below by request.

import cookielib
import os
import urllib
import urllib2
import re
import string
from BeautifulSoup import BeautifulSoup

username = "user@email.com"
password = "password"

cookie_filename = "parser.cookies.txt"

class LinkedInParser(object):

    def __init__(self, login, password):
        """ Start up... """
        self.login = login
        self.password = password

        # Simulate browser with cookies enabled
        self.cj = cookielib.MozillaCookieJar(cookie_filename)
        if os.access(cookie_filename, os.F_OK):
            self.cj.load()
        self.opener = urllib2.build_opener(
            urllib2.HTTPRedirectHandler(),
            urllib2.HTTPHandler(debuglevel=0),
            urllib2.HTTPSHandler(debuglevel=0),
            urllib2.HTTPCookieProcessor(self.cj)
        )
        self.opener.addheaders = [
            ('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
                           'Windows NT 5.2; .NET CLR 1.1.4322)'))
        ]

        # Login
        self.loginPage()

        title = self.loadTitle()
        print title

        self.cj.save()


    def loadPage(self, url, data=None):
        """
        Utility function to load HTML from URLs for us with hack to continue despite 404
        """
        # We'll print the url in case of infinite loop
        # print "Loading URL: %s" % url
        try:
            if data is not None:
                response = self.opener.open(url, data)
            else:
                response = self.opener.open(url)
            return ''.join(response.readlines())
        except:
            # If URL doesn't load for ANY reason, try again...
            # Quick and dirty solution for 404 returns because of network problems
            # However, this could infinite loop if there's an actual problem
            return self.loadPage(url, data)

    def loginPage(self):
        """
        Handle login. This should populate our cookie jar.
        """
        html = self.loadPage("https://www.linkedin.com/")
        soup = BeautifulSoup(html)
        csrf = soup.find(id="loginCsrfParam-login")['value']

        login_data = urllib.urlencode({
            'session_key': self.login,
            'session_password': self.password,
            'loginCsrfParam': csrf,
        })

        html = self.loadPage("https://www.linkedin.com/uas/login-submit", login_data)
        return

    def loadTitle(self):
        html = self.loadPage("https://www.linkedin.com/feed/")
        soup = BeautifulSoup(html)
        return soup.find("title")

parser = LinkedInParser(username, password)

2014 年 6 月 19 日更新:从主页添加了对 CSRF 令牌的解析,以便在更新的登录过程中使用.

Update June 19, 2014: Added parsing for CSRF token from homepage for use in updated login process.

2015 年 7 月 23 日更新: 在此处添加 Python 3 示例.基本上需要替换库位置并删除不推荐使用的方法.它不是完全格式化或任何东西,但它的功能.很抱歉匆忙工作.最终原理和步骤都是一样的.

Update July 23, 2015: Adding a Python 3 example here. Basically requires substituting library locations and removing deprecated methods. It's not perfectly formatted or anything, but it functions. Sorry for the rush job. In the end the principals and steps are identical.

import http.cookiejar as cookielib
import os
import urllib
import re
import string
from bs4 import BeautifulSoup

username = "user@email.com"
password = "password"

cookie_filename = "parser.cookies.txt"

class LinkedInParser(object):

    def __init__(self, login, password):
        """ Start up... """
        self.login = login
        self.password = password

        # Simulate browser with cookies enabled
        self.cj = cookielib.MozillaCookieJar(cookie_filename)
        if os.access(cookie_filename, os.F_OK):
            self.cj.load()
        self.opener = urllib.request.build_opener(
            urllib.request.HTTPRedirectHandler(),
            urllib.request.HTTPHandler(debuglevel=0),
            urllib.request.HTTPSHandler(debuglevel=0),
            urllib.request.HTTPCookieProcessor(self.cj)
        )
        self.opener.addheaders = [
            ('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
                           'Windows NT 5.2; .NET CLR 1.1.4322)'))
        ]

        # Login
        self.loginPage()

        title = self.loadTitle()
        print(title)

        self.cj.save()


    def loadPage(self, url, data=None):
        """
        Utility function to load HTML from URLs for us with hack to continue despite 404
        """
        # We'll print the url in case of infinite loop
        # print "Loading URL: %s" % url
        try:
            if data is not None:
                response = self.opener.open(url, data)
            else:
                response = self.opener.open(url)
            return ''.join([str(l) for l in response.readlines()])
        except Exception as e:
            # If URL doesn't load for ANY reason, try again...
            # Quick and dirty solution for 404 returns because of network problems
            # However, this could infinite loop if there's an actual problem
            return self.loadPage(url, data)

    def loadSoup(self, url, data=None):
        """
        Combine loading of URL, HTML, and parsing with BeautifulSoup
        """
        html = self.loadPage(url, data)
        soup = BeautifulSoup(html, "html5lib")
        return soup

    def loginPage(self):
        """
        Handle login. This should populate our cookie jar.
        """
        soup = self.loadSoup("https://www.linkedin.com/")
        csrf = soup.find(id="loginCsrfParam-login")['value']
        login_data = urllib.parse.urlencode({
            'session_key': self.login,
            'session_password': self.password,
            'loginCsrfParam': csrf,
        }).encode('utf8')

        self.loadPage("https://www.linkedin.com/uas/login-submit", login_data)
        return

    def loadTitle(self):
        soup = self.loadSoup("https://www.linkedin.com/feed/")
        return soup.find("title")

parser = LinkedInParser(username, password)

这篇关于使用 python 请求会话登录到 LinkedIn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆