使用HtmlAgilityPack从节点获取文本 [英] Getting the text from a node using HtmlAgilityPack
问题描述
我有以下HTML:
<div class="top">
<p>Blah.</p>
I want <em>this</em> text.
</div>
提取字符串"I want <em>this</em> text.
"的XPath表示法是什么?
我不一定要使用单个XPath表达式来提取字符串.选择多个节点,然后遍历它们以生成句子,也是很好的选择.
What is the XPath notation to extract the string "I want <em>this</em> text.
"?
I don't necessarily want a single XPath expression to extract the string. Selecting multiple nodes, and iterating over them to produce the sentence, would be great as well.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myHtml);
doc.DocumentNode.SelectSingleNode("??????");
推荐答案
您要提取什么,节点或字符串?
What do you want to extract, nodes or a string?
如果需要节点,"I want <em>this</em> text."
是一个XML片段,它由两个文本节点和一个<em>
元素的顶层组成,该元素具有一个文本节点子节点.由于它在顶层具有多个节点,因此您需要使用SelectNodes("xpath expression a la @Alejandro")
而不是SelectSingleNode()
来提取它们.
If you want nodes, "I want <em>this</em> text."
is an XML fragment consisting at the top level of two text nodes and an <em>
element, which has a text node child. Since it has multiple nodes at the top level, you need to use SelectNodes("xpath expression a la @Alejandro")
rather than SelectSingleNode()
to extract them.
如果您想要一个字符串,则再次需要使用SelectNodes();然后遍历所选节点并连接每个节点的externalHTML.请参阅此处一个类似例子的好例子.
If you want a string, again you need to use SelectNodes(); and then iterate over the selected nodes and concatenate the outerHTML of each one. See here for a good example of something similar.
此外,从您的示例中还不清楚一点,通常XPath表达式将为您提供所需的内容.例如.您是否想要在<div class="top">
下的初始<p>...</p>
之后的所有内容?还是您想要<div>
下的所有文本,除了 all <p>
元素?也许还有其他东西?当然,如果@Alejandro的XPath表达式对您有用,那么它已经足够明确了.
Also, it's a little unclear from your example what XPath expression would in general give you what you want. E.g. do you want everything after the initial <p>...</p>
under <div class="top">
? Or do you want all text under the <div>
except all <p>
elements? Or maybe something else? Of course if @Alejandro's XPath expressions work for you then it's already well-specified enough.
这篇关于使用HtmlAgilityPack从节点获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!