使用C#中的HtmlAgilityPack获取其他元素内的特定元素 [英] Get specific element inside other element with HtmlAgilityPack in C#
问题描述
我正在一个项目中,我需要解析很多html文件.我需要从一个<div class="story-body">
I'm working on a project where I need to parse a lot of html files. I need to get every <p>
from within one <div class="story-body">
到目前为止,我已经有了这段代码,它可以实现我想要的功能,但是我想知道如何使用xpath表达式来执行此操作.我试过了:
So far I have this code and it does what I want, but I was wondering how to do this using the xpath expression. I tried this:
textBody.SelectNodes ("What to put here? I tried //p but it gives every p in document not inside the one div")
但是没有成功.有什么想法吗?
But without success. Any ideas?
public void Parse(){
HtmlNode title = doc.DocumentNode.SelectSingleNode ("//h1[(@class='story-header')]");
HtmlNode textBody = doc.DocumentNode.SelectSingleNode ("//div[(@class='story-body')]");
XmlText textT;
XmlText textS;
string story = "";
if(title != null){
textT = xmlDoc.CreateTextNode(title.InnerText);
titleElement.AppendChild(textT);
Console.WriteLine(title.InnerText);
}
foreach (HtmlNode node in textBody.ChildNodes) {
if(node.Name == "p" || (node.Name == "span" && node.GetAttributeValue("class", "class") == "cross-head")){
story += node.InnerText + "\n\n";
Console.WriteLine(node.InnerText);
}
}
textS = xmlDoc.CreateTextNode (story);
storyElement.AppendChild (textS);
try
{
xmlDoc.Save("test.xml");
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
推荐答案
这是一件很简单的事情,您只需将.
添加到类似于.//p
的字符串中,就可以只获得的子节点.当前节点.
That's a rather simple thing to do, you just have to add a .
to the string like .//p
, that way you get only child nodes of the current node.
另一种方法是像这样调用SelectNodes:
Another way would be to just call SelectNodes like this:
doc.DocumentNode.SelectNodes("//div[(@class='story-body')]/p");
这篇关于使用C#中的HtmlAgilityPack获取其他元素内的特定元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!