使用HtmlAgilityPack从节点获取文本 [英] Getting the text from a node using HtmlAgilityPack

查看:181
本文介绍了使用HtmlAgilityPack从节点获取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下HTML:

<div class="top">
    <p>Blah.</p>
    I want <em>this</em> text.
</div>

提取字符串"I want <em>this</em> text."的XPath表示法是什么? 我不一定要使用单个XPath表达式来提取字符串.选择多个节点,然后遍历它们以生成句子,也是很好的选择.

What is the XPath notation to extract the string "I want <em>this</em> text."? I don't necessarily want a single XPath expression to extract the string. Selecting multiple nodes, and iterating over them to produce the sentence, would be great as well.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myHtml);
doc.DocumentNode.SelectSingleNode("??????");

推荐答案

您要提取什么,节点或字符串?

What do you want to extract, nodes or a string?

如果需要节点,"I want <em>this</em> text."是一个XML片段,它由两个文本节点和一个<em>元素的顶层组成,该元素具有一个文本节点子节点.由于它在顶层具有多个节点,因此您需要使用SelectNodes("xpath expression a la @Alejandro")而不是SelectSingleNode()来提取它们.

If you want nodes, "I want <em>this</em> text." is an XML fragment consisting at the top level of two text nodes and an <em> element, which has a text node child. Since it has multiple nodes at the top level, you need to use SelectNodes("xpath expression a la @Alejandro") rather than SelectSingleNode() to extract them.

如果您想要一个字符串,则再次需要使用SelectNodes();然后遍历所选节点并连接每个节点的externalHTML.请参阅此处一个类似例子的好例子.

If you want a string, again you need to use SelectNodes(); and then iterate over the selected nodes and concatenate the outerHTML of each one. See here for a good example of something similar.

此外,从您的示例中还不清楚一点,通常XPath表达式将为您提供所需的内容.例如.您是否想要在<div class="top">下的初始<p>...</p>之后的所有内容?还是您想要<div>下的所有文本,除了 all <p>元素?也许还有其他东西?当然,如果@Alejandro的XPath表达式对您有用,那么它已经足够明确了.

Also, it's a little unclear from your example what XPath expression would in general give you what you want. E.g. do you want everything after the initial <p>...</p> under <div class="top">? Or do you want all text under the <div> except all <p> elements? Or maybe something else? Of course if @Alejandro's XPath expressions work for you then it's already well-specified enough.

这篇关于使用HtmlAgilityPack从节点获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆