如何使用 Nokogiri 的 xpath 和 at_xpath 方法 [英] How to use Nokogiri's xpath and at_xpath methods

查看:59
本文介绍了如何使用 Nokogiri 的 xpath 和 at_xpath 方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习如何使用 Nokogiri,根据此代码,我遇到了一些问题:

I'm learning how to use Nokogiri and few questions came to me based on this code:

require 'rubygems'
require 'mechanize'

post_agent = WWW::Mechanize.new
post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')

puts "\nabsolute path with tbody gives nil"
puts  post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect

puts "\n.at_xpath gives an empty string"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").at_xpath('text()').to_s.strip.inspect

puts "\ntwo lines solution with .at_xpath gives an empty string"
rows =   post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].at_xpath('text()').to_s.strip.inspect


puts
puts "two lines working code"
rows =   post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].xpath('text()').to_s.strip

puts "\none line working code"
puts post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip

puts "\nanother one line code"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").xpath('text()').to_s.strip

puts "\none line code with full path"
puts post_page.parser.xpath("/html/body/div/div/div/div/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip

  • 在 XPath 中使用 // 还是 / 更好?@AnthonyWJones 说使用不带前缀的 //"并不是一个好主意.
  • 我必须从任何可用的 XPath 中删除 tbody 否则我会得到 nil 结果.如何从 XPath 中删除元素以使其正常工作?
  • 如果不使用完整的 XPath,我是否必须使用 xpath 两次才能提取数据?
  • 为什么我不能让 at_xpath 工作来提取数据?它在如何使用 Nokogiri 解析 HTML 表?".有什么区别?
    • Is it better to use // or / in XPath? @AnthonyWJones says that "the use of an unprefixed //" is not such a good idea.
    • I had to remove tbody from any working XPath otherwise I got a nil result. How is possible to remove an element from the XPath to get things to work?
    • Do I have to use xpath twice to extract data if not using a full XPath?
    • Why can't I make at_xpath work to extract data? It works nicely in "How do I parse an HTML table with Nokogiri?". What is the difference?
    • 推荐答案

      1. // 表示每个级别的每个节点,因此与 / 相比,它的成本要高得多.
      2. 您可以使用 * 作为占位符.
      3. 不,您可以进行 XPath 查询,获取元素,然后在节点上调用 Nokogiri 的 text 方法.
      4. 当然可以.看看在 Nokogiri 中选择子节点的绝对最便宜的方法是什么?" 和我的基准文件.您将看到 at_xpath 的示例.
      1. // means every node at every level so it's much more expensive compared to /.
      2. You can use * as a placeholder.
      3. No, you can make an XPath query, get the element then call Nokogiri's text method on the node.
      4. Sure you can. Have a look at "What is the absolutely cheapest way to select a child node in Nokogiri?" and my benchmark file. You will see an example of at_xpath.

      我发现您经常使用 text() 表达式.这不是使用 Nokogiri 所必需的.您可以检索节点,然后在节点上调用 text 方法.它的价格要便宜得多.

      I found you often use the text() expression. This is not required using Nokogiri. You can retrieve the node then call the text method on the node. It's much less expensive.

      还要记住,Nokogiri 支持 CSS 选择器.如果您使用 HTML 页面,它们会更容易.

      Also keep in mind Nokogiri supports CSS selectors. They can be easier if you are working with HTML pages.

      这篇关于如何使用 Nokogiri 的 xpath 和 at_xpath 方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆