Python和机械化登录脚本 [英] Python and mechanize login script
问题描述
程序员们!
我正在尝试编写脚本以使用python和mechanize模块登录我的大学的食物平衡"页面.
I am trying to write a script to login into my universities "food balance" page using python and the mechanize module...
这是我要登录的页面: http://www.wcu.edu/11407.asp 该网站具有以下登录格式:
This is the page I am trying to log into: http://www.wcu.edu/11407.asp The website has the following form to login:
<FORM method=post action=https://itapp.wcu.edu/BanAuthRedirector/Default.aspx><INPUT value=https://cf.wcu.edu/busafrs/catcard/idsearch.cfm type=hidden name=wcuirs_uri>
<P><B>WCU ID Number<BR></B><INPUT maxLength=12 size=12 type=password name=id> </P>
<P><B>PIN<BR></B><INPUT maxLength=20 type=password name=PIN> </P>
<P></P>
<P><INPUT value="Request Access" type=submit name=submit> </P></FORM>
由此我们知道我需要填写以下字段: 1.名称= id 2.名称= PIN
From this we know that I need to fill in the following fields: 1. name=id 2. name=PIN
通过操作:action = https://itapp.wcu.edu/BanAuthRedirector/Default.aspx
With the action: action=https://itapp.wcu.edu/BanAuthRedirector/Default.aspx
这是我到目前为止编写的脚本:
This is the script I have written thus far:
#!/usr/bin/python2 -W ignore
import mechanize, cookielib
from time import sleep
url = 'http://www.wcu.edu/11407.asp'
myId = '11111111111'
myPin = '22222222222'
# Browser
#br = mechanize.Browser()
#br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))
br = mechanize.Browser(factory=mechanize.RobustFactory()) # Use this because of bad html tags in the html...
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (fake agent to google-chrome linux x86_64)
br.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.8'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')]
# The site we will navigate into
br.open(url)
# Go though all the forms (for debugging only)
for f in br.forms():
print f
# Select the first (index two) form
br.select_form(nr=2)
# User credentials
br.form['id'] = myId
br.form['PIN'] = myPin
br.form.action = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx'
# Login
br.submit()
# Wait 10 seconds
sleep(10)
# Save to a file
f = file('mycatpage.html', 'w')
f.write(br.response().read())
f.close()
现在是问题...
出于某种奇怪的原因,我返回的页面(在mycatpage.html中)是登录页面,而不是显示我的猫现金余额"和零食数量"的预期页面...
For some odd reason the page I get back (in mycatpage.html) is the login page and not the expected page that displays my "cat cash balance" and "number of block meals" left...
有人知道为什么吗?请记住,头文件一切都正确,而id和pass并不是真正的111111111和222222222,但正确的值确实适用于网站(使用浏览器...)
Does anyone have any idea why? Keep in mind that everything is correct with the header files and while the id and pass are not really 111111111 and 222222222, the correct values do work with the website (using a browser...)
预先感谢
编辑
我尝试过的另一个脚本:
Another script I tried:
from urllib import urlopen, urlencode
import urllib2
import httplib
url = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx'
myId = 'xxxxxxxx'
myPin = 'xxxxxxxx'
data = {
'id':myId,
'PIN':myPin,
'submit':'Request Access',
'wcuirs_uri':'https://cf.wcu.edu/busafrs/catcard/idsearch.cfm'
}
opener = urllib2.build_opener()
opener.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.8'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')]
request = urllib2.Request(url, urlencode(data))
open("mycatpage.html", 'w').write(opener.open(request))
这具有相同的行为...
This has the same behavior...
推荐答案
# User credentials
br.form['id'] = myId
br.form['PIN'] = myPin
我相信这是问题所在.
尝试将其更改为
br['id'] = myId
br['PIN'] = myPin
我也很确定您不需要br.form.action = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx'
,因为您已经选择了表单,因此只需调用Submit即可,但是我可能是错的.
I'm also pretty sure that you don't need br.form.action = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx'
because you have already selected the form so just calling submit should work, but I could be wrong.
此外,我仅使用urllib和urllib2做过类似的任务,因此,如果这不起作用,我将发布该代码.
Additionally, I have done a similar task just using urllib and urllib2, so if this doesn't work I will post that code.
这是我用于urllib和urllib2的技术:
here is the the technique that I used with urllib and urllib2:
import urllib2, urllib
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
encoded = urllib.urlencode({"PIN":my_pin, "id":my_id})
f = opener.open('http://www.wcu.edu/11407.asp', encoded)
data = f.read()
f.close()
>>> b = mechanize.Browser(factory=mechanize.RobustFactory())
>>> b.open('http://www.wcu.edu/11407.asp')
<response_seek_wrapper at 0x10acfa248 whose wrapped object = <closeable_response at 0x10aca32d8 whose fp = <socket._fileobject object at 0x10aaf45d0>>>
>>> b.select_form(nr=2)
>>> b.form
<mechanize._form.HTMLForm instance at 0x10ad0dbd8>
>>> b.form.attrs
{'action': 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx', 'method': 'post'}
这可能是您的问题?不确定.
This could be your problem? Not sure.
修改3:
使用html检查器,我认为您很有可能需要将'wcuirs_uir'设置为'https://cf.wcu.edu/busafrs/catcard/idsearch.cfm'.我有95%的肯定能行得通.
Used an html inspector, I think there's a decent chance you need to set 'wcuirs_uir' to 'https://cf.wcu.edu/busafrs/catcard/idsearch.cfm'. I'm 95% sure that will work.
这篇关于Python和机械化登录脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!