XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtmlexcludeme</a> [英] XPath: "Exclude" tag in "InnerHtml" (<a href="">InnerHtmlexcludeme</a>

查看：45 发布时间：2021/7/17 18:44:34 html xpath screen-scraping

本文介绍了XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtmlexcludeme</a>的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 XPath 来查询 HTML 站点，到目前为止效果很好，但现在我遇到了(砖)墙并且找不到解决方案:-)

I am using XPath to query HTML sites, which works pretty good so far, but now I hit a (brick)wall and can't find a solution :-)

html 如下所示:

<ul>
<li><a href="">Text1<span>AnotherText1</span></a></li>
<li><a href="">Text2<span>AnotherText2</span></a></li>
<li><a href="">Text3<span>AnotherText3</span></a></li>
</ul>

我想选择TextX"部分，而不是  中的 AnotherTextX 部分到目前为止，我无法想出任何(纯)XPath 解决方案来做到这一点(不幸的是，在我的设置中，我需要一个纯 XPath 解决方案.

I want to select the "TextX" part, but NOT the AnotherTextX part in the  So far I couldn't come up with any (pure) XPath solution to do that (and in my setup I unfortunately need a pure XPath solution.

这会选择我想要的类型，但结果是TextXAnotherTextX"，而我只需要TextX".

This selects kind of what I want, but it results in "TextXAnotherTextX" and I only need "TextX".

/ul/li/a

有什么提示吗?:-)

推荐答案

这将获得的第一个直接文本节点子节点:

This gets you the first direct text node child of <a>:

/ul/li/a/text()[1]

这会让你任何直接文本节点子节点(单独):

and this would get you any direct text node child (separately):

/ul/li/a/text()

以上都返回 "TextX"，但如果你有:

Both of the above return "TextX", but if you had:

<li><a href="">Text4<span>AnotherText3</span>TrailingText</a></li>

那么后者会返回:["Text4", "TrailingText"]，而前者只会返回"Text4".

then the latter would return: ["Text4", "TrailingText"], while the former would return "Text4" only.

你的表达式 /ul/li/a 得到的字符串值，它被定义为所有子元素的字符串值的串联，所以你得到 "TextXAnotherTextX".

Your expression /ul/li/a gets the string value of <a>, which is defined as the concatenation of the string value of all the children of <a>, so you get "TextXAnotherTextX".

这篇关于XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtmlexcludeme</a>的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtml<span>excludeme</span></a> [英] XPath: "Exclude" tag in "InnerHtml" (<a href="">InnerHtml<span>excludeme</span></a>

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

XPath:“排除"“InnerHtml"中的标签(&lt;a href=“"&gt;InnerHtml&lt;span&gt;excludeme&lt;/span&gt;&lt;/a&gt; [英] XPath: &quot;Exclude&quot; tag in &quot;InnerHtml&quot; (&lt;a href=&quot;&quot;&gt;InnerHtml&lt;span&gt;excludeme&lt;/span&gt;&lt;/a&gt;

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

XPath:“排除"“InnerHtml"中的标签(<a href=“">InnerHtml<span>excludeme</span></a> [英] XPath: "Exclude" tag in "InnerHtml" (<a href="">InnerHtml<span>excludeme</span></a>

登录关闭