:在Nokogiri中有CSS伪类 [英] :has CSS pseudo class in Nokogiri

查看:84
本文介绍了:在Nokogiri中有CSS伪类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在:has nofollow> Nokogiri
它应该像jQuery的 选择器



例如:

  li>< h1>< a href =dfd> ex1< / a>< / h1>< span class =string> sdfsdf< / span>< / li> 
< li>< h1>< a href =dsfsdf> ex2< / a>< / h1>< span class =string>< / span> li>
< li>< h1>< a href =sdfd> ex3< / a>< / h1>< / li>

CSS选择器应该只返回第一个链接,非空的 span.string sibling。



在jQuery中,此选择器工作良好:

  $('li:has(span.string:not(:empty))> h1> a')

但不在Nokogiri中:

  Nokogiri :: HTML(html_source)。 css('li:has(span.string:not(:empty))> h1> a')

:不是:empty 效果很好,但不是:has







  1. Nokogiri中有CSS选择器的文档吗? li>
  2. 也许有人可以写一个自定义:has 伪类?这里有一个示例如何编写:regexp selector。

  3. 可以选择使用XPath。如何为 li编写XPath?:has(span.string:not(:empty))> h1> a


解决方案

Nokogiri当前实现:has() 的问题是它创建了XPath,它要求内容是一个直接子节点,而不是任何子节点:

  puts Nokogiri :: CSS.xpath_for(a:has(b))
#=> // a [b]
#=>应输出//a[.b]正确

什么jQuery,你需要允许 span 是一个后代元素。例如:

  require'nokogiri'
d = Nokogiri.XML('< r>< a / ;< a>< b>< / b>< / a>< / r>')
d.at_css('a:has(b)') => #< Nokogiri :: XML :: Element:0x14dd608 name =achildren = [#< Nokogiri :: XML :: Element:0x14dd3e0 name =bchildren = [#< Nokogiri :: XML :: Element :0x14dd20c name =c>]>]>
d.at_css('a:has(c)')#=> nil
d.at_xpath('// a [.// c]')#=> #< Nokogiri :: XML :: Element:0x14dd608 name =achildren = [#< Nokogiri :: XML :: Element:0x14dd3e0 name =bchildren = [#< Nokogiri :: XML :: Element :0x14dd20c name =c>]>]>

对于你的特定情况,这里是完整的破碎XPath:

  puts Nokogiri :: CSS.xpath_for(li:has(span.string:not(:empty))> h1> a)
#=> // li [span [contains(concat('',@class,''),'string')and not(not(node())]] / h1 / a



这里它是固定的:

 只需添加。
// li [.// span [contains(concat('',@class,''),'string')and not(not(node h1 / a

#简化为假设在span上只有一个CSS类存在
//li[./span[@class='string'而不是(不是节点))]]] / h1 / a

#假设`not(:empty)`真的意思是里面有一些文本
//li[.span[ class ='string'and text()]] / h1 / a

#..或者你真的想要在下面有一些文本
//li [@ class ='string'and .//text ()]]/h1/a

#..或者你真的想要至少有一个元素子
// li [.// span [@ class ='string'and *]] / h1 / a


I'm looking for the pseudoclass :has in Nokogiri. It should work just like jQuery's has selector.

For example:

<li><h1><a href="dfd">ex1</a></h1><span class="string">sdfsdf</span></li>
<li><h1><a href="dsfsdf">ex2</a></h1><span class="string"></span></li>
<li><h1><a href="sdfd">ex3</a></h1></li>

The CSS selector should return only the first link, the one with the not-empty span.string sibling.

In jQuery this selector works well:

$('li:has(span.string:not(:empty))>h1>a')

but not in Nokogiri:

Nokogiri::HTML(html_source).css('li:has(span.string:not(:empty))>h1>a')

:not and :empty works well, but not :has.


  1. Is there any documentation for CSS selectors in Nokogiri?
  2. Maybe someone can write a custom :has pseudo class? Here is an example how to write a :regexp selector.
  3. Optionally I can use XPath. How do I write XPath for li:has(span.string:not(:empty))>h1>a?

解决方案

The problem with Nokogiri's current implementation of :has() is that it creates XPath that requires the contents to be a direct child, not any descendant:

puts Nokogiri::CSS.xpath_for( "a:has(b)" )
#=> "//a[b]"
#=> Should output "//a[.//b]" to be correct

To make this XPath match what jQuery does, you need to allow the span to be a descendant element. For example:

require 'nokogiri'
d = Nokogiri.XML('<r><a/><a><b><c/></b></a></r>')
d.at_css('a:has(b)')    #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>
d.at_css('a:has(c)')    #=> nil
d.at_xpath('//a[.//c]') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>

For your specific case, here's the full "broken" XPath:

puts Nokogiri::CSS.xpath_for( "li:has(span.string:not(:empty)) > h1 > a" )
#=> //li[span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

And here it is fixed:

# Adding just the .//
//li[.//span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

# Simplified to assume only one CSS class is present on the span
//li[.//span[@class='string' and not(not(node()))]]/h1/a

# Assuming that `not(:empty)` really meant "Has some text in it"
//li[.//span[@class='string' and text()]]/h1/a

# ..or maybe you really wanted "Has some text anywhere underneath"
//li[.//span[@class='string' and .//text()]]/h1/a

# ..or maybe you really wanted "Has at least one element child"
//li[.//span[@class='string' and *]]/h1/a

这篇关于:在Nokogiri中有CSS伪类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆