是否有办法在Nokogiri CSS中转义非字母数字字符? [英] Is there a way to escape non-alphanumeric characters in Nokogiri css?
问题描述
我有一个锚标记:
file.html#stuff-morestuff-CHP-1-SECT-2.1
尝试在Nokogiri中提取引用的内容:
documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1')
失败,并显示错误:
'[#< Nokogiri :: CSS::Node:0x007fd1a7df9b40 @ type =:CONDITIONAL_SELECTOR,@value = [#< Nokogiri :: CSS :: Node:0x007fd1a7df9b90 @ type =:ELEMENT_NAME,@value = ["*"]> ;,#< Nokogiri :: CSS::节点:0x007fd1a7df9cd0 @类型=:ID,@值= [#unixnut4-CHP-1-SECT-2"]>]>]'(Nokogiri :: CSS :: SyntaxError)尝试尝试一下-我认为Nokogiri抱怨选择器ID中的 .1
,因为.
在html ID中无效.
我没有内容的所有权,所以我真的不想遍历所有错误的ID,如果可以避免的话,请进行修复.有没有办法在nokogiri .css()
调用中转义非字母数字选择器?
假设您的HTML看起来像这样:
< div id ='stuff-morestuff-CHP-1-SECT-2.1'> foo</div>
有问题的字符串 stuff-morestuff-CHP-1-SECT-2.1
,是是有效的CSS选择器-.
字符在那里无效.
您应该能够使用斜杠转义.
,即,这是有效的CSS选择器:
<代码>#stuff-morestuff-CHP-1-SECT-2 \ .1
不幸的是,这在Nokogiri中似乎不起作用,可能是CSS到XPath转换中的错误.(它在浏览器中确实有效).
您可以通过直接检查 id
属性来解决此问题:
documentFragment.at_css('* [id ="stuff-morestuff-CHP-1-SECT-2.1"]')
即使斜杠转义有效,您也可能必须检查 id
属性,如果它的值以数字开头,这在HTML中是有效的,但不能(据我所知))表示为CSS选择器,甚至可以转义.
您还可以使用XPath,它的 id
功能,您可以在此处使用它:
documentFragment.xpath("id('stuff-morestuff-CHP-1-SECT-2.1')")
I have an anchor tag:
file.html#stuff-morestuff-CHP-1-SECT-2.1
Trying to pull the referenced content in Nokogiri:
documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1')
fails with the error:
unexpected '.1' after '[#<Nokogiri::CSS:
:Node:0x007fd1a7df9b40 @type=:CONDITIONAL_SELECTOR, @value=[#<Nokogiri::CSS::Node:0x007fd1a7df9b90 @type=:ELEMENT_NAME, @value=["*"]>, #<Nokogiri::CSS::Node:0x007fd1a7df9cd0 @
type=:ID, @value=["#unixnut4-CHP-1-SECT-2"
]>]>]' (Nokogiri::CSS::SyntaxError)
Just trying talk through this - I think Nokogiri is complaining about the .1
in the selectorId, because .
is not valid in an html id.
I don't own the content, so I really don't want to go through and fix all the bad IDs if it is avoidable. Is there a way to escape non-alphanumeric selectors in a nokogiri .css()
call?
Assuming your HTML looks something like this:
<div id='stuff-morestuff-CHP-1-SECT-2.1'>foo</div>
The string in question, stuff-morestuff-CHP-1-SECT-2.1
, is a valid HTML ID, but it isn’t a valid CSS selector — the .
character isn’t valid there.
You should be able to escape the .
with a slash character, i.e. this is a valid CSS selector:
#stuff-morestuff-CHP-1-SECT-2\.1
Unfortunately this doesn’t seem to work in Nokogiri, there may be a bug in the CSS to XPath translation that it does. (It does work in the browser).
You can get around this by just checking the id
attribute directly:
documentFragment.at_css('*[id="stuff-morestuff-CHP-1-SECT-2.1"]')
Even if slash escaping worked, you would probably have to check the id
attribute like this if it value started with a digit, which is valid in HTML but cannot be (as far as I can tell) expressed as a CSS selector, even with escaping.
You could also use XPath, which has an id
function that you can use here:
documentFragment.xpath("id('stuff-morestuff-CHP-1-SECT-2.1')")
这篇关于是否有办法在Nokogiri CSS中转义非字母数字字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!