基于requests库python爬虫:date header都提交了但是无法登陆

查看:139
本文介绍了基于requests库python爬虫:date header都提交了但是无法登陆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问 题

想用requests模拟登陆学校的教务处网站,然后做一些自动抢课的程序,但是登陆问题都难以解决啊_(:зゝ∠)_,明明已经把Data和Header差不多都写好了

代码如下

# __author__ = ''
# -*- coding: utf-8 -*-

import requests
from time import sleep
with requests.session() as s:
    login_url = "https://cas.gzhu.edu.cn/cas_server/login;jsessionid=4CEEEE1C74B97277272DAAF0A4073B0D?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp"
    Data = {
        "username": "**********",
        "password": "******",
        "captcha": "",
        "execution": "e2s1",
        "warn": "true",
        "_eventId": "submit",
    }

    Header = {
        "Host": "cas.gzhu.edu.cn",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://cas.gzhu.edu.cn/cas_server/login?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp",
        "Connection": "keep-alive"
    }

    # 无法登陆
    new = s.post(login_url, data=Data, headers=Header)
    print new.text

好久才发现自己没有登录,对后面网址进行操作时出现了循环重定向的错误,不知道是不是因为没有登录造成的.还有我们网站好多都是临时重定向302,简直是low!!

PS:最近快期末考了还在不务正业,我估计是要挂科了,还望各位大大能教教我哪里错了啊,不然大亏大亏啊T.T


登陆的问题貌似解决的,是lt和JSESSIONID字段要到登陆界面提取,但是现在是到目标网址是会发生

requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

的问题,还有在注释Host之前还有一个错误

requests.exceptions.ConnectionError: HTTPConnectionPool(host='cas.gzhu.edu.cn', port=80): Max retries exceeded with url: /c/portal/login;jsessionid=78B851574390A3C4080A3AD99E18E20D?p_l_id=96998&_58_redirect=%2F (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x026CAB10>: Failed to establish a new connection: [Errno 10061] ',))

我再试试一下好了

新增代码如下


import requests
import re
from time import sleep
with requests.session() as s:
    login_url = "https://cas.gzhu.edu.cn/cas_server/login"
    first = s.get(login_url)
    lt = re.findall(r'name="lt" value="(.*?)"', first.text)
    print lt[0]
    cookie = first.headers["Set-Cookie"]
    # print cookie
    Js = re.findall(r"(JSESSIONID=.*?);", cookie)
    print Js[0]
    s.headers = {
        # "Host": "cas.gzhu.edu.cn",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://cas.gzhu.edu.cn/cas_server/login?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp",
        "Cookie": Js[0],
        "Connection": "keep-alive"
    }
    Data = {
        "username": "",
        "password": "",
        "captcha": "",
        "warn": "true",
        "lt": lt,
        "execution": "e1s1",
        "_eventId": "submit",
    }

    # 登陆
    new = s.post(login_url, data=Data)
    print new.url

解决方案

# coding=utf-8

import requests
from pyquery import PyQuery as Q

session = requests.Session()
session.headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36'
}

login_url = "https://cas.gzhu.edu.cn/cas_server/login"

data = {
    'username': 'XXX',
    'password': 'XXX',
    'submit': '登录'
}

#获取参数
r = session.get(login_url)
for _ in Q(r.text).find('input[type="hidden"]'):
    data[Q(_).attr('name')] = Q(_).val()

#登录1
session.post(login_url, data)

#登录2
session.get('http://202.192.18.182/Login_gzdx.aspx')

r = session.get('http://202.192.18.182/xf_xstyxk.aspx?xh=1506100007')
print r.text

这篇关于基于requests库python爬虫:date header都提交了但是无法登陆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆