通过Python请求登录网站 [英] Login to website via Python Requests

查看:114
本文介绍了通过Python请求登录网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个大学项目,我目前正在尝试登录一个网站,并从我的用户个人资料中抓取一些细节(新闻列表).

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.

我是Python的新手,但是我之前在其他网站上才这样做.我的前两种方法会传递不同的HTTP错误.我考虑过我的请求正在发送的标头出现问题,但是我对该站点登录过程的理解似乎不够.

I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.

这是登录页面: http://seekingalpha.com/account/login

我的第一种方法是这样的:

My first approach looks like this:

import requests

with requests.Session() as c:
    requestUrl ='http://seekingalpha.com/account/orthodox_login'

    USERNAME = 'XXX'
    PASSWORD = 'XXX'

    userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

    login_data = {
        "slugs[]":None,
        "rt":None,
        "user[url_source]":None,
        "user[location_source]":"orthodox_login",
        "user[email]":USERNAME,
        "user[password]":PASSWORD
        }

    c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})

    page = c.get("http://seekingalpha.com/account/email_preferences")
    print(page.content)

这将导致"403禁止访问"

This results in "403 Forbidden"

我的第二种方法是这样的:

My second approach looks like this:

from requests import Request, Session

requestUrl ='http://seekingalpha.com/account/orthodox_login'

USERNAME = 'XXX'
PASSWORD = 'XXX'

userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'

# c.get(requestUrl) 
login_data = {
    "slugs[]":None,
    "rt":None,
    "user[url_source]":None,
    "user[location_source]":"orthodox_login",
    "user[email]":USERNAME,
    "user[password]":PASSWORD
    }
headers = {
    "accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
    "origin":"http://seekingalpha.com",
    "referer":"http://seekingalpha.com/account/login",
    "Cache-Control":"max-age=0",
    "Upgrade-Insecure-Requests":1,
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
    }

s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)

prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"

resp = s.send(prepped)

print(resp.status_code)

在这种方法中,我试图完全按照浏览器的要求准备标头.很抱歉提供冗余.这将导致HTTP错误400.

In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.

有人有什么想法吗,出了什么问题?可能很多.

Does someone have an idea, what went wrong? Probably a lot.

推荐答案

我建议您直接使用Cookie抓取页面,而不是花费大量精力手动登录并使用Session.

Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.

登录时,通常会在您的请求中添加一个cookie以标识您的身份.请参见以下示例:

When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:

您的代码将如下所示:

import requests
response = requests.get("www.example.com", cookies={
                        "c_user":"my_cookie_part",
                        "xs":"my_other_cookie_part"
                        })
print response.content

这篇关于通过Python请求登录网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆