在Python中进行机械化-提交后重定向不起作用 [英] Mechanize in Python - Redirect is not working after submit

查看:68
本文介绍了在Python中进行机械化-提交后重定向不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始在Python中使用机械化,但是我已经遇到了一些问题.我在StackOverflow和Google上四处查看,并且看到人们说该文档很棒,并且应该很容易使它起作用,但是我想我不知道如何查找该文档,因为我可以找到的代码示例并没有真正教会我如何去做我想做的特定事情.如果有人能指出我这种文档,我将很高兴自己阅读并解决我的问题.

I just started using mechanize in Python and I'm having some problems with it already. I've looked around on StackOverflow and on Google and I've seen people say that the documentation is great and that it should be easy to get it working, but I think I don't know how to look for that documentation since all I can find is code examples which don't really teach me how to do the particular things I'm trying to do. If anyone could point me to such documentation, I'd be glad to read it myself and solve my problem.

对于实际问题,我正在尝试通过以表格形式发送用户名和密码信息来登录网站.当信息正确时,我通常会被重定向,但无法机械化.

For the actual problem, I'm trying to log in to a website by sending my username and password information in a form. When the information is correct, I'm usually redirected, but it doesn't work in mechanize.

这是我不了解的部分,因为如果我在调用Submit之后立即打印页面的html内容,页面将显示一个变量,该变量表明身份验证是有效的.如果我将密码更改为错误的密码,则html会显示一条消息无效的凭据",就像我正常浏览该网站时一样.

This is the part that I don't get, because if I immediately print the html content of the page after calling submit, the page displays a variable that shows that the authentication is valid. If I change the password to an incorrect one, the html shows a message "Invalid credentials", as it would if I were browsing the site normally.

这是我如何执行此操作的代码示例.请记住,这可能是完全错误的,因为我只是尝试应用我在示例中找到的内容:

Here's a code sample of how I'm doing it. Keep in mind that it might be totally wrong as I'm only trying to apply what I found in examples:

import mechanize
import cookielib

# Start Browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()

br.set_cookiejar(cj)

br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.open('http://www.complexejuliequilles.com/')


for l in br.links(url_regex='secure'):
    br.follow_link(l)

br.select_form('form1')

br.form['fldUsername'] = 'myUsername'
br.form['fldPassword'] = 'myPassword'
br.submit()

在此特定示例中,我打开 http://www.complexejuliequilles.com ,然后点击链接在底部带有管理"文本的位置,在该表单中输入我的凭据,然后提交.通常,我将被重定向到我所在的第一页,但具有更多仅管理员可用的按钮.我想单击其中一个链接以填写另一张表格,以添加一个我有其电子邮件地址,姓名等的用户列表.

In this particular example, I open http://www.complexejuliequilles.com, then I follow a link at the bottom that has the text "Administration", where I enter my credentials in the form, then I submit it. Normally, I would be redirected to the first page I was on, but with more buttons that are only usable by administrators. I want to click one of those links to fill out another form to add a list of users of which I have their email addresses, names, etc.

我缺少一些简单的东西吗?我想我已经掌握了基础知识,但是我对库的了解还不足以找到重定向的问题所在.

Is there something simple I'm missing? I think I get the basics, but I don't know the library enough to find what is going wrong with the redirect.

推荐答案

http://wwwsearch. sourceforge.net/mechanize/documentation.html

避免直接使用"_http".名称中的第一个下划线告诉我们,开发人员正在将其视为私有的东西,您可能不需要它.

Avoid using "_http" directly. The first underscore in a name tells us that the developer was thinking on it as something private, and you probably don't need it.

In [20]: mechanize.HTTPRefreshProcessor is mechanize._http.HTTPRefreshProcessor
Out[20]: True

在打开URL之前,您确实放了一些不需要的东西.例如:mechanize.Browser()不是urllib,它已经为您管理cookie.您不应该避免使用robots.txt.您可以通过查看默认的处理程序来遵循更多的配置约定":

There are some stuff you put before opening the URL that you don't really need. For example: mechanize.Browser() isn't urllib, it already manages the cookies for you. You should not avoid robots.txt. You can follow some more "convention over configuration" by seeing before which handlers are default:

mechanize.Browser().handlers

您可能在该列表中有mechanize.HTTPRedirectHandler(我知道),如果没有:

You probably have mechanize.HTTPRedirectHandler in that list (I do), if not:

br.set_handle_redirect(mechanize.HTTPRedirectHandler)

for循环很奇怪,似乎您在循环内更改了它的迭代器(打开的URL内的链接)(浏览器打开了另一个URL).首先,我认为您想在存在安全" URL匹配的情况下递归单击.错误将取决于links()生成器的实现方式(可能遵循固定的br.response()实例),但是我认为您只想遵循匹配的第一个链接:

The for loop is strange, it seems like you're changing its iterator (links inside an open URL) inside the loop (browser opens another URL). I first thought you wanted to click recursively while there's a "secure" URL match. An error would depend on how the links() generator is implemented (probably it follows a fixed br.response() instance), but I think you just want to follow the first link that match:

In [50]: br.follow_link(url_regex="secure") # No loops

我不知道您需要哪种重定向/刷新. JavaScript更改window.location.href?如果是这样,除非您自己解析JavaScript,否则mechanize不会这样做.

I don't know what kind of redirecting/refreshing you need. JavaScript changing window.location.href? If so, mechanize won't do it, unless you parse the JavaScript yourself.

您可以通过以下方式获取有关上一个打开的URL的原始"信息:

You can get the "raw" information about the last open URL this way:

last_response = br.response() # This is returned by br.open(...) too
http_header_dict = last_response.info().dict
html_string_list = last_response.readlines()
html_data = "".join(html_string_list)

即使是JavaScript,也可以通过使用html_data.find(),正则表达式,BeautifulSoup等将其定位在html_data中来获取重定向URL.

Even if it's a JavaScript, you can get the redirection URL by locating it in the html_data, using html_data.find(), regular expressions, BeautifulSoup, etc..

PEP8注意:避免使用孤立的"l"(小写"L")作为变量,根据所使用的字体和上下文,它可能会错误地显示为一个"或"I"(大写"i").您应该改用"L"或其他名称.

PEP8 note: Avoid using isolated "l" (lower "L") as variable, it might be mistakenly seem as "one" or "I" (upper "i") depending on the used font and context. You should use "L" or other name instead.

这篇关于在Python中进行机械化-提交后重定向不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆