python中的Mechanizer-选择没有名称的表单字段 [英] Mechanizer in python - selecting form field with no name

查看:74
本文介绍了python中的Mechanizer-选择没有名称的表单字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到与类似的问题机械化形式(python)我想在登录屏幕后面抓取网站的数据.但是,我不知道如何选择没有名称的表单域.控件如下所示:

I want to scrape the data of a website behind a login screen. However, I don't know how to select a form field that does not have a name. The controls look like this:

<TextControl(<None>=)>
<PasswordControl(<None>=)>
<CheckboxControl(<None>=[on])>
<SubmitButtonControl(<None>=) (readonly)>>

通常它说<TextControl(login=)>,所以我可以使用br.form['login'] = 'mylogin',但是这次我不能,因为我不知道登录字段的名称.

Usually it says <TextControl(login=)>, so I can use br.form['login'] = 'mylogin' But this time I can't, since I don't know the name of the login field.

我可以访问该表单,但是由于我猜想的值而无法填写TextControl或PasswordControl.我的基本代码如下:

I'm able to access the form, but cannot fill out the TextControl or PasswordControl due to the value I guess. My basic code looks like this:

import mechanize
from bs4 import BeautifulSoup
import urllib2 
import cookielib

cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.set_handle_robots(False)
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

url = "www.example.com"
request = urllib2.Request(url, None, hdr)
response = br.open(request)
forms =  [form for form in br.forms()][0]
br.select_form(nr=0)

我尝试过这样的事情:

br.form.find_control(id="id").value = "loginname"

和这个:

forms[0].set_value("new value", nr=0)

这将引发诸如mechanize._response.httperror_seek_wrapper: HTTP Error 403: ForbiddenTypeError: control name must be string-like的错误.我不知道还能尝试什么.请在这里帮助我.

This throws errors such as mechanize._response.httperror_seek_wrapper: HTTP Error 403: Forbidden or TypeError: control name must be string-like. I don't know what else to try. Please help me out here.

推荐答案

根据您的代码:

url = "www.example.com"
request = urllib2.Request(url, None, hdr)
response = br.open(request)
forms =  [form for form in br.forms()][0]
br.select_form(nr=0)

以下:

aux = 0
for f in br.form.controls:
    print f,
    print '   ---> Number: ',
    print aux
    aux = aux + 1 

结果是:

<TextControl(<None>=)>   ---> Number:  0 
<PasswordControl(<None>=)>   ---> Number:  1
<CheckboxControl(<None>=[on])>   ---> Number:  2
<SubmitButtonControl(<None>=) (readonly)>   ---> Number:  3

现在,您可以尝试以下操作:

Now, you can try this:

br.form.controls[0]._value = "loginname"
br.form.controls[1]._value = "password"

所以:

for f in br.form.controls:
    print f

结果将是:

<TextControl(<None>=loginname)>
<PasswordControl(<None>=password)>
<CheckboxControl(<None>=[on])>
<SubmitButtonControl(<None>=) (readonly)>

这篇关于python中的Mechanizer-选择没有名称的表单字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆