使用机械化登录网页 [英] Using mechanize to login to a webpage

查看:110
本文介绍了使用机械化登录网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次使用Python编程的经验,我正尝试登录 网页.搜寻后,我发现许多人建议使用mechanize.只是为了确保在我编写代码之前我已经正确设置了一切,我才从网站下载了mechanize zip并将python脚本保存在未压缩的机械文件夹中.

到目前为止,我已经使用发现的不同示例获得了这段代码:

import mechanize

theurl = 'http://voyager.umeres.maine.edu/Login'
mech = mechanize.Browser()
mech.open(theurl)

mech.select_form(nr=0)
mech["userid"] = "MYUSERNAME"
mech["password"] = "MYPASSWORD"
results = mech.submit().read()

f = file('test.html', 'w')
f.write(results) 
f.close()

通过查看网页的来源,我相信用户名/密码是表单的正确名称.当我在IDLE中运行脚本时,会遇到很多错误,包括超时错误和机械手错误.完整的回溯: 即使代码有效,我也不确定我应该期待什么.登录名是我的学校电子邮件,其中也包含班级文件夹.我要完成的最终工作是,登录帐户后,我想解析一些文件夹以获取信息并将其存储在文件中,然后可以将其转换为json或RSS feed,但这要远得多对Python有更好的了解的道路,只是试图对我想要完成的事情给出更清晰的想法.

问题是Mechanize尊重robots.txt

您必须将其关闭.

解决方案:

mech = mechanize.Browser()
// needs to be set before you call open
mech.set_handle_robots(False)

网站似乎正在使用某种其他POST值 通过Javascript生成的重新创建自己可能很痛苦,请查看页面的源代码以了解发生了什么. 实际发送的POST值:

challenge   [a14b1f67-11edcc01]
charset UTF-8
login   Login
origurl /Login/
password    
savedpw 0
sha1    3f77d1e8c2ab0470ef8005a85f5f9c0d7aeedba6
userid  sdsads

This is my first experience in programming with Python and I'm trying to log in to this webpage. After searching around I found that many people suggested using mechanize. Just to be sure that I setup things correctly before I get to code I downloaded the mechanize zip from the website and had my python script in the unzipped mechanize folder.

I have this code so far using different examples I've found:

import mechanize

theurl = 'http://voyager.umeres.maine.edu/Login'
mech = mechanize.Browser()
mech.open(theurl)

mech.select_form(nr=0)
mech["userid"] = "MYUSERNAME"
mech["password"] = "MYPASSWORD"
results = mech.submit().read()

f = file('test.html', 'w')
f.write(results) 
f.close()

From looking at the source of the webpage I believe the userid/password are the correct names for the form. When I run the script in IDLE I get a bunch of errors including a time out error and a robot error. The full traceback: I'm not exactly sure what I should expect either even if the code works. The login is for my school email which has class folders as well. My end game for what i'm trying to accomplish is once I log into my account I wanted to parse some folders for information and store them in a file that can be later converted in to json or RSS feed, but this is much further down the road with a much better understanding of Python just trying to give a more clear idea of what I want to accomplish.

解决方案

The problem is that Mechanize is respecting the robots.txt

You must turn it off.

Solution:

mech = mechanize.Browser()
// needs to be set before you call open
mech.set_handle_robots(False)

Edit: it appears that the site is using some sort of additional POST values that are generated via Javascript. This maybe a pain to recreate yourself, check the source of the page to see what's going on. Actual POST values being sent:

challenge   [a14b1f67-11edcc01]
charset UTF-8
login   Login
origurl /Login/
password    
savedpw 0
sha1    3f77d1e8c2ab0470ef8005a85f5f9c0d7aeedba6
userid  sdsads

这篇关于使用机械化登录网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆