HtmlAgilityPack SelectNodes表达式可忽略具有特定属性的元素 [英] HtmlAgilityPack SelectNodes expression to ignore an element with a certain attribute

查看:95
本文介绍了HtmlAgilityPack SelectNodes表达式可忽略具有特定属性的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从脚本节点和具有称为"relativeNav"的类的ul中选择节点.有人可以引导我走正确的道路吗?我已经搜寻了一个星期,却找不到任何地方.目前,我有这个,但是它显然也选择了//ul [@ class ='relativeNav'].无论如何,是否要放置它的NOT表达式,以便SelectNode忽略该表达式?

I am trying to select nodes except from script nodes and a ul that has a class called 'relativeNav'. Can someone please direct me to the right path? I have been searching for this for a week and I can't find it anywhere. Currently I have this but it obviously selecting the //ul[@class='relativeNav'] as well. Is there anyway to put an NOT expression of it so that SelectNode will ignore that one?

        foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()"))
        {
            Console.WriteLine("Node: " + node);
            singleString += node.InnerText.Trim() + "\n";
        }

推荐答案

给出一个HTML文档,其结构类似于:

Given an Html document with a structure similar to:

<html>
<head><title>HtmlDocument</title>
</head>
<body>
<div>
<span>Hello Span World</span>
<script>
Script Text
</script>
</div>
<ul class='relativeNav'>
<li>Hello </li>
<li>Li</li>
<li>World</li>
</ul>
</body>
</html>

以下XPath表达式将选择不是脚本元素的所有节点,但不包括类为"relativeNav"的UL元素的所有子元素:

The following XPath expression will select all nodes which are not script elements excluding all children of UL elements with class 'relativeNav':

var nodes = htmlDoc.DocumentNode.SelectNodes("//body//*[not(parent::ul[@class='relativeNav']) and not(self::script)]/text()");

更新:忘了提及,如果您需要排除ul [class ='relativeNav']的任何子级,而不论其深度如何,都应使用:

Update: forgot to mention that if you need to exclude any children of ul[class='relativeNav'] irrespective of their depth you should use:

"//body//*[not(ancestor::ul[@class='relativeNav']) and not(self::script)]/text()"

如果您还希望排除ul元素(由于元素不包含文本,因此在上面的示例中有些无关紧要),则应指定:

If you wanted to exclude the ul element as well (somewhat irrelevant in the example above since the element does not contain text) you should specify:

"//body//*[not(ancestor-or-self::ul[@class='relativeNav']) and not(self::script)]"

这篇关于HtmlAgilityPack SelectNodes表达式可忽略具有特定属性的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆