C#HtmlAgilityPack从给定类的所有div获取内容 [英] C# HtmlAgilityPack get content from all div with given class

查看:2621
本文介绍了C#HtmlAgilityPack从给定类的所有div获取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HTML文件,如下所示:

 < div class =user_meals> 
< div class =name>名字姓氏< / div>
< div class =day_meals>
< div class =meal>第一餐< / div>
< / div>
< div class =day_meals>
< div class =meal>第二餐< / div>
< / div>
< div class =day_meals>

< div class =meal>第三餐< / div>

< / div>
< div class =day_meals>

< div class =meal>第四餐< / div>

< / div>

< div class =day_meals>

< div class =meal>第五餐< / div>

< / div>

此代码重复执行几次。



我希望姓名姓氏< div> <之间/ code>标签与类名称。



这是我使用HtmlAgilityPack的代码:

  HtmlDocument doc = new的HTMLDocument(); 
doc.LoadHtml(@C:\workspace\file.html);
$ b foreach(doc.DocumentNode.SelectNodes中的HtmlNode节点(// div [@ class ='name']))
{
string vaule = node.InnerText;
}

但实际上它不起作用。 Visual Studio抛出异常:


类型为'System.NullReferenceException'的未处理异常。



解决方案

您使用错误的方法从路径加载HTML LoadHtml 期望HTML和不是文件的位置。请改为使用加载



您所获得的错误非常具有误导性,因为所有属性都不为空且标准提示来自什么是NullReferenceException,我该如何解决它? 不适用。



基本上这来自事实 SelectNodes 正确返回 null 因为没有与查询匹配的元素,并且 foreach 会抛出它。



固定代码:

  HtmlDocument doc = new HtmlDocument(); 
// doc.Load(@C:\ workspace \ file.html)或传递HTML:
doc.LoadHtml(< div class ='user_meals'>< div class ='name'> Name Surname< / div>< / div>);
var nodes = doc.DocumentNode.SelectNodes(// div [@ class ='name']);
//如果找不到任何内容,SelectNodes将返回null - 可能需要检查
if(nodes == null)
{
抛出新的InvalidOperationException(我的所有节点都在哪里??? );
}
foreach(节点中的HtmlNode节点)
{
string vaule = node.InnerText;
vaule.Dump();
}


I have a HTML file that looks like this:

<div class="user_meals">
<div class="name">Name Surname</div>
<div class="day_meals">
    <div class="meal">First Meal</div>
</div>  
<div class="day_meals">
    <div class="meal">Second Meal</div>
</div>
<div class="day_meals">

    <div class="meal">Third Meal</div>

</div>
<div class="day_meals">

    <div class="meal">Fourth Meal</div>

</div>

<div class="day_meals">

    <div class="meal">Fifth Meal</div>

</div>

This code repeats a few times.

I want to get Name and Surname which is between <div> tag with class "name".

This is my code using HtmlAgilityPack:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"C:\workspace\file.html");

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='name']"))
{
    string vaule = node.InnerText;
}

But actually it doesn't work. Visual Studio throws me Exception:

An unhandled exception of type 'System.NullReferenceException'.

解决方案

You are using wrong method to load HTML from a path LoadHtml expect HTML and not location of the file. Use Load instead.

The error you are getting is quite misleading as all properties are not null and standard tips from What is a NullReferenceException, and how do I fix it? don't apply.

Essentially this comes from the fact SelectNodes correctly returns null as there are not elements matching the query and foreach throws on it.

Fixed code:

HtmlDocument doc = new HtmlDocument();
// either doc.Load(@"C:\workspace\file.html") or pass HTML:
doc.LoadHtml("<div class='user_meals'><div class='name'>Name Surname</div></div> ");
var nodes = doc.DocumentNode.SelectNodes("//div[@class='name']");
// SelectNodes returns null if nothing found - may need to check 
if (nodes == null)
{ 
    throw new InvalidOperationException("Where all my nodes???");    
}
foreach (HtmlNode node in nodes)
{
    string vaule = node.InnerText;
    vaule.Dump();
}

这篇关于C#HtmlAgilityPack从给定类的所有div获取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆