是否可以找到< td> ..</td> </td>中的任何一个.价值是已知的吗? [英] Is it possible to find the <td> .. </td> text, when any of the <td>..</td> value is known?
问题描述
我有一个网页,其网页格式与html
相似,如下所示:
I have an webpage which has the similar kind of html
format as below:
<form name="test">
<td> .... </td>
.
.
.
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>
</form>
现在,我只知道值bla bla
,基于该值我们可以跟踪或找到3rd last
..值(此处为alo
)吗?我可以在HREF
值的帮助下跟踪这些值,但是HREF
值并不总是固定的,它们可以随时为任意值.
Now, I know only the value bla bla
, base on the value can we track or find the 3rd last
.. value(which is here alo
)? I can track those,with the help of HREF
values,but the HREF
values are not fixed always, they can be anything anytime.
推荐答案
从HTML文档中提取每个<td>
都很容易,但这并不是浏览DOM的可靠方法.但是,鉴于示例HTML的局限性,这是一个解决方案.我怀疑它是否可以在实际情况下工作.
Extracting every <td>
from an HTML document is easy, but it's not a foolproof way to navigate the DOM. However, given the limitations of the sample HTML, here's a solution. I doubt it'll work in a real-world situation though.
Mechanize在内部使用Nokogiri进行繁重的工作,因此如果您已经需要Mechanize,则无需执行require 'nokogiri'
.
Mechanize uses Nokogiri internally for its heavy lifting so doing require 'nokogiri'
isn't necessary if you've already required Mechanize.
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<td> <A HREF="http://www.edu/st/file.html">alo</A> </td>
<td> <A HREF="http://www.dom/st/file.html">foo</A> </td>
<td> bla bla </td>
EOT
doc.search('td')[-3].at('a')['href']
=> "http://www.edu/st/file.html"
如何练习如何从机械化代理"获取Nokogiri文档作为用户的练习.
How to get the Nokogiri document from the Mechanize "agent" is left as an exercise for the user.
这篇关于是否可以找到< td> ..</td> </td>中的任何一个.价值是已知的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!