使用Mechanize查找下一个输入元素? [英] Finding next input element using Mechanize?

查看:330
本文介绍了使用Mechanize查找下一个输入元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用机械化,是否可以在页面的HTML中找到一个短语,例如电子邮件",然后找到下一个<input*,然后填写该输入字段,仅填写该字段? /p>

解决方案

Mechanize在内部使用Nokogiri来处理其DOM解析,这是其在页面中定位不同元素的能力的基础.

可以访问已解析的DOM,并且可以通过它使用Nokogiri定位元素,而Mechanize通常不允许我们查找.例如:

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.example.com')

# Use Nokogiri to find the content of the <h1> tag...
puts page.at('h1').content # => "Example Domain"

对于您的搜索,您想要使用XPath访问器来找到电子邮件"在页面中的位置.完成后,您可以找到下一个<input>标记.

从一个简单的HTML片段开始,我们假装它来自Mechanize:

page = Nokogiri::HTML('<div><form><p>email</p><input name="email"></form></div>')
puts page.to_html

外观如下:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div><form>
<p>email</p>
<input name="email">
</form></div></body></html>

搜索电子邮件":

page.at("//*[contains(text(),'email')]")
#<Nokogiri::XML::Element:0x3ff50d0c4bc0 name="p" children=[#<Nokogiri::XML::Text:0x3ff50d0c497c "email">]>

在此基础上,获得<input>标记:

input_tag = page.at("//*[contains(text(),'email')]/following-sibling::input")
#<Nokogiri::XML::Element:0x3ff50d09b75c name="input" attributes=[#<Nokogiri::XML::Attr:0x3ff50d09b5f4 name="name" value="email">]>

找到输入标签后,您可以使用Nokogiri从标签中获取名称",然后告诉Mechanize查找并填写该特定输入字段:

input_tag['name']
=> "email"

要使Web窗体正常运行,它必须具有元素的名称.提交表单时,这些将传递到服务器.如果没有名称,将需要大量工作来确定哪个输入发送了特定的数据,并且程序员很懒,我们不想努力工作,因此您可以指望使用一个名称. /p>

有关更多信息,请参见" Ruby Mechanize,Nokogiri和Net :: HTTP " ,以及搜索堆栈溢出,并阅读 Nokogiri文档和教程将为您提供很多必要的信息,以弄清如何做其余的事情.

Using Mechanize, is it possible to find a phrase in the HTML of a page, for example, "email", and find the next <input* after that, and fill in that input field, and only that field?

解决方案

Mechanize uses Nokogiri internally to handle its DOM parsing, which is the basis of its ability to locate different elements in a page.

It's possible to access the parsed DOM, and, through it use Nokogiri to locate elements Mechanize doesn't normally let us find. For instance:

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.example.com')

# Use Nokogiri to find the content of the <h1> tag...
puts page.at('h1').content # => "Example Domain"

For your search you'd want to use an XPath accessor to locate where "email" is in the page. Once you've done that you can locate the next <input> tag.

Starting from a simple HTML fragment, we'll pretend this comes from Mechanize:

page = Nokogiri::HTML('<div><form><p>email</p><input name="email"></form></div>')
puts page.to_html

Which looks like:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div><form>
<p>email</p>
<input name="email">
</form></div></body></html>

Searching for "email":

page.at("//*[contains(text(),'email')]")
#<Nokogiri::XML::Element:0x3ff50d0c4bc0 name="p" children=[#<Nokogiri::XML::Text:0x3ff50d0c497c "email">]>

Building upon that, this gets the <input> tag:

input_tag = page.at("//*[contains(text(),'email')]/following-sibling::input")
#<Nokogiri::XML::Element:0x3ff50d09b75c name="input" attributes=[#<Nokogiri::XML::Attr:0x3ff50d09b5f4 name="name" value="email">]>

Once you've found that input tag, you can get the "name" from the tag using Nokogiri, and then tell Mechanize to locate and fill in that particular input field:

input_tag['name']
=> "email"

For a web form to function correctly, it has to have names for the elements. Those get passed to the server when the form is submitted. Without the names it'd take a lot of work to determine which input sent a particular piece of data, and, programmers being lazy, we don't want to work hard, so you can count on having a name to work with.

See "Ruby Mechanize, Nokogiri and Net::HTTP" for more information, plus a search of Stack Overflow, and reading the Nokogiri documenation and tutorials will give you lots of needed information for figuring out how to do the rest.

这篇关于使用Mechanize查找下一个输入元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆