使用Python登录网站(urllib,urllib2,cookielib):如何找到要提交的必要信息? [英] Logging in to a web site with Python (urllib,urllib2,cookielib): How does one find necessary information for submission?

查看:108
本文介绍了使用Python登录网站(urllib,urllib2,cookielib):如何找到要提交的必要信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前言:我了解对于类似的问题有很多答复,例如在堆栈溢出时.但是,我还没有发现任何与aspx登录有关的信息,也没有类似这样的确切案例.

Preface: I understand that there are many responses for similar questions such as this on stack overflow. However, I haven't found anything relating to aspx log ins, nor an exact case such as this.

问题:我需要确定什么信息才能登录到 https://cableone.net/login.aspx 以便从那里抓取信息.

Problem: I need to determine what information is necessary in order to log in to https://cableone.net/login.aspx in order to scrape information from there.

进度:到目前为止,我已经在login.aspx的源代码中找到了输入字段,并使用urllib,urllib2和cookielib拼凑了python中的脚本.我忽略了在脚本中具有空白值的任何thigig.

Progress: Thus far I have found input fields in the source of login.aspx and have scrapped together a script in python with urllib,urllib2,and cookielib. I ignored anythig that had a blank value in my script.

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"value="/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM" />
<input name="ctl00$plhMain$txtUserName" type="text" id="ctl00_plhMain_txtUserName" />
<input name="ctl00$plhMain$txtPassword" type="password" id="ctl00_plhMain_txtPassword" />
<input type="submit" name="ctl00$plhMain$btnLogin" value="Login" id="ctl00_plhMain_btnLogin" />

然后我在下面将上述输入值与python和urllib结合使用.

I then utilized the above input values with python and urllib in the following.

import urllib, urllib2, cookielib
from cookielib import CookieJar


url = 'https://myaccount.cableone.net/Login.aspx'

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
cookies = cookielib.CookieJar()

#determine what I need to change with these values 
formValues = {
    "__VIEWSTATE":"/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT",
    "__EVENTVALIDATION":"/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM",
    "ctl00$plhMain$txtUserName":"myAccount",
    "ctl00$plhMain$txtPassword":"myPassword"
    }

data = urllib.urlencode(formValues)

response = opener.open("https://myaccount.cableone.net/Login.aspx",data)
thePage = response.read()
httpheaders = response.info()
print thePage 

推荐答案

如果表单以任何方式都是动态的,那么您概述的方法将很困难.一种更通用的方法是安装具有良好开发人员工具的 Google Chrome Canary ,点击检查页面",然后转到网络"标签,并标记为保留日志". (您可能需要Canary版本,因为如果我没记错的话,常规版本不会捕获某些数据)

The approach you outlined will be difficult if the form is dynamic in any way. A more universal way is to install Google Chrome Canary which has good developer tools, click "inspect page", then go to "Network" tab, and mark "Preserve log". (You may need the Canary version, because the regular one doesn't catch some of the data if I'm not mistaken)

所有这些都打开,单击登录",您将看到所有请求和标头以及POST数据.这将为您提供所有发送到服务器的POST数据.

With all this open, click "login", and you'll see all the requests and headers and POST data. This will give you all the POST data that is sent to the server.

现在,您可以测试脚本中的数据,并一一删除.测试请求的另一种方法是使用高级REST客户,顺便说一句.

Now, you can test the data in your script, and remove it one by one. Another option for testing the requests is to use Advanced REST Client, by the way.

这篇关于使用Python登录网站(urllib,urllib2,cookielib):如何找到要提交的必要信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆