Nokogiri:搜索<div>使用 XPath [英] Nokogiri: Searching for <div> using XPath

查看:54
本文介绍了Nokogiri:搜索<div>使用 XPath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Nokogiri (Rubygem) css 搜索来寻找特定的 <div> 在我的 html 中.看起来 Nokogiri 的 css 搜索不喜欢正则表达式.我想切换到 Nokogiri 的 xpath 搜索,因为它似乎支持搜索字符串中的正则表达式.>

如何在 xpath 搜索中实现下面提到的(伪)css 搜索?

需要'rubygems'需要'nokogiri'value = Nokogiri::HTML.parse(<<-HTML_END)"<身体><p id='para-1'>A</p><p id='para-22'>B</p><h1>Bla</h1><p id='para-3'>C</p><p id='para-4'>D</p><div class="foo" id="eq-1_bl-1"><p id='para-5'>3</p>

</html>"HTML_END# my_block 已给出my_bl = "1"# my_eq 对应这个正则表达式my_eq = "\/[0-9]+\/"# FIXME 以下行应更改为 xpath 搜索.if my_div = value.css("div#eq-#{my_eq}_bl-#{my_bl}.foo").first# 用 <p> 做一些事情在div里面结尾

解决方案

Mike Dalessio(一半的Nokogiri 核心开发人员)在 #nokogiri (irc.freenode.net) 上给了我一个答案.看起来 Nokogiri CSS 和 XPath 搜索都不支持正则表达式匹配.这是他关于如何使用 Nokogiri 搜索正则表达式的解决方案:

需要'rubygems'需要'nokogiri'value = Nokogiri::HTML.parse(<<-HTML_END)"<身体><p id='para-1'>A</p><p id='para-22'>B</p><h1>Bla</h1><p id='para-3'>C</p><p id='para-4'>D</p><div class="foo" id="eq-1_bl-1"><p id='para-5'>3</p>

<div class="bar" id="eq-1_bl-1"><p id='para-5'>3</p>

</html>"HTML_END# my_block 已给出my_bl = "1"# my_eq 对应这个正则表达式my_eq = "[0-9]+"# 在节点 id 中搜索的完整正则表达式full_regex = %r(eq-#{my_eq}_bl-#{my_bl})filter_by_id = Class.new 做attr_accessor :匹配定义初始化(正则表达式)@regex = 正则表达式@matches = []结尾def过滤器(节点集)@matches += node_set.find_all { |x|x['id'] =~ @regex }结尾end.new(full_regex)value.css("div.foo:filter()", filter_by_id)filter_by_id.matches.each 做 |node|放置节点结尾

I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings.

How do I implement the (pseudo) css search mentioned below in an xpath search?

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <p id='para-22'>B</p>
      <h1>Bla</h1>
      <p id='para-3'>C</p>
      <p id='para-4'>D</p>
      <div class="foo" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
    </body>
  </html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "\/[0-9]+\/"

# FIXME The following line should be changed to an xpath search.
if my_div = value.css("div#eq-#{my_eq}_bl-#{my_bl}.foo").first
  # doing some stuff with the <p> inside the div
end

解决方案

Mike Dalessio (one half of the Nokogiri core developers) gave me an answer on #nokogiri (irc.freenode.net). Looks like neither Nokogiri CSS nor XPath search do support regex matching. This is his solution on how to search for regular expressions with Nokogiri:

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <p id='para-22'>B</p>
      <h1>Bla</h1>
      <p id='para-3'>C</p>
      <p id='para-4'>D</p>
      <div class="foo" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
      <div class="bar" id="eq-1_bl-1">
        <p id='para-5'>3</p>
      </div>
    </body>
  </html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "[0-9]+"
# full regex to search for in node ids
full_regex = %r(eq-#{my_eq}_bl-#{my_bl})

filter_by_id = Class.new do
  attr_accessor :matches

  def initialize(regex)
    @regex = regex
    @matches = []
  end

  def filter(node_set)
    @matches += node_set.find_all { |x| x['id'] =~ @regex }
  end
end.new(full_regex)

value.css("div.foo:filter()", filter_by_id)
filter_by_id.matches.each do |node|
  puts node
end

这篇关于Nokogiri:搜索&lt;div&gt;使用 XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
其他开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆