将 XPath 与 HTML 或 XML 片段一起使用? [英] Using XPath with HTML or XML fragment?

查看:62
本文介绍了将 XPath 与 HTML 或 XML 片段一起使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Nokogiri 和 XPath 的新手,我正在尝试访问 HTML 或 XML 片段中的所有评论.当我不使用 fragment 函数时,XPaths .//comment()//comment() 工作,但他们没有找到任何带有片段的东西.使用标记而不是注释,它适用于第一个 XPath.

I am new to Nokogiri and XPath, and I am trying to access all comments in a HTML or XML fragment. The XPaths .//comment() and //comment() work when I am not using the fragment function, but they do not find anything with a fragment. With a tag instead of a comment, it works with the first XPath.

通过反复试验,我意识到在这种情况下 comment() 只能找到顶级注释,而 .//comment() 和其他一些只能找到内部注释.难道我做错了什么?我错过了什么?谁能解释一下发生了什么?

By trial and error, I realized that in this case comment() finds only top level comments and .//comment() and some others find only inner comments. Am I doing something wrong? What am I missing? Can anyone explain what is happening?

我应该使用什么 XPath 来获取 Nokogiri 解析的 HTML 片段中的所有注释?

What XPath should I use to get all comments in a HTML fragment parsed by Nokogiri?

这个例子可以帮助理解问题:

This example can help to understand the problem:

str = "<!-- one --><p><!-- two --></p>"

# this works:
Nokogiri::HTML(str).xpath("//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d71d5c " one ">, #<Nokogiri::XML::Comment:0x3f8535d71cf8 " two ">]
Nokogiri::HTML(str).xpath(".//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535cc7974 " one ">, #<Nokogiri::XML::Comment:0x3f8535cc7884 " two ">]

# with fragment, it does not work:
Nokogiri::HTML.fragment(str).xpath("//comment()")
=> []
Nokogiri::HTML.fragment(str).xpath("comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d681a8 " one ">]
Nokogiri::HTML.fragment(str).xpath(".//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d624d8 " two ">]
Nokogiri::HTML.fragment(str).xpath("*//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d5cb8c " two ">]
Nokogiri::HTML.fragment(str).xpath("*/comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d4e104 " two ">]

# however it does if it is a tag instead of a comment:
str = "<a desc='one'/> <p><a>two</a><a desc='three'/></p>"
Nokogiri::HTML.fragment(str).xpath(".//a")
=> [#<Nokogiri::XML::Element:0x3f8535cb44c8 name="a" attributes=[#<Nokogiri::XML::Attr:0x3f8535cb4194 name="desc" value="one">]>, #<Nokogiri::XML::Element:0x3f8535cb4220 name="a" children=[#<Nokogiri::XML::Text:0x3f8535cb3ba4 "two">]>, #<Nokogiri::XML::Element:0x3f8535cb3a3c name="a" attributes=[#<Nokogiri::XML::Attr:0x3f8535cb3960 name="desc" value="three">]>]

PS:没有 fragment 它可以做我想要的,但它还添加了一些像DOCTYPE"这样的东西,我真的只有我正在编辑的 HTML 文件的一个片段(删除一些标签替换其他人).

PS: Without fragment it does what I want, but it also adds some stuff like "DOCTYPE" and I really have only a fragment of a HTML file that I am editing (removing some tags, replacing others).

推荐答案

"descendant::comment()""descendant::sometag" 在每个案例,但我仍然不明白这些差异.

"descendant::comment()" and "descendant::sometag" works fine in every case, but I still don't understand these differences.

这篇关于将 XPath 与 HTML 或 XML 片段一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆