XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtml<span>excludeme</span></a> [英] XPath: "Exclude" tag in "InnerHtml" (<a href="">InnerHtml<span>excludeme</span></a>
问题描述
我正在使用 XPath 来查询 HTML 站点,到目前为止效果很好,但现在我遇到了(砖)墙并且找不到解决方案:-)
I am using XPath to query HTML sites, which works pretty good so far, but now I hit a (brick)wall and can't find a solution :-)
html 如下所示:
<ul>
<li><a href="">Text1<span>AnotherText1</span></a></li>
<li><a href="">Text2<span>AnotherText2</span></a></li>
<li><a href="">Text3<span>AnotherText3</span></a></li>
</ul>
我想选择TextX"部分,而不是 <span></span>
中的 AnotherTextX 部分到目前为止,我无法想出任何(纯)XPath 解决方案来做到这一点(不幸的是,在我的设置中,我需要一个纯 XPath 解决方案.
I want to select the "TextX" part, but NOT the AnotherTextX part in the <span></span>
So far I couldn't come up with any (pure) XPath solution to do that (and in my setup I unfortunately need a pure XPath solution.
这会选择我想要的类型,但结果是TextXAnotherTextX",而我只需要TextX".
This selects kind of what I want, but it results in "TextXAnotherTextX" and I only need "TextX".
/ul/li/a
有什么提示吗?:-)
推荐答案
This gets you the first direct text node child of <a>
:
/ul/li/a/text()[1]
这会让你任何直接文本节点子节点(单独):
and this would get you any direct text node child (separately):
/ul/li/a/text()
以上都返回 "TextX"
,但如果你有:
Both of the above return "TextX"
, but if you had:
<li><a href="">Text4<span>AnotherText3</span>TrailingText</a></li>
那么后者会返回:["Text4", "TrailingText"]
,而前者只会返回"Text4"
.
then the latter would return: ["Text4", "TrailingText"]
, while the former would return "Text4"
only.
你的表达式 /ul/li/a
得到 的字符串值,它被定义为所有子元素的字符串值的串联
,所以你得到
"TextXAnotherTextX"
.
Your expression /ul/li/a
gets the string value of <a>
, which is defined as the concatenation of the string value of all the children of <a>
, so you get "TextXAnotherTextX"
.
这篇关于XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtml<span>excludeme</span></a>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!