如何使用 HTML 敏捷包 [英] How to use HTML Agility pack

查看:24
本文介绍了如何使用 HTML 敏捷包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 HTML 敏捷包?

我的 XHTML 文档不完全有效.这就是我想使用它的原因.我如何在我的项目中使用它?我的项目是在 C# 中.

My XHTML document is not completely valid. That's why I wanted to use it. How do I use it in my project? My project is in C#.

推荐答案

首先,安装 HTMLAgilityPack nuget 包到您的项目中.

First, install the HTMLAgilityPack nuget package into your project.

那么,举个例子:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;

// filePath is a path to a file containing the html
htmlDoc.Load(filePath);

// Use:  htmlDoc.LoadHtml(xmlString);  to load from a string (was htmlDoc.LoadXML(xmlString)

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
    // Handle any parse errors as required

}
else
{

    if (htmlDoc.DocumentNode != null)
    {
        HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

(注意:此代码只是一个示例,不一定是最好/唯一的方法.不要在您自己的应用程序中盲目使用它.)

(NB: This code is an example only and not necessarily the best/only approach. Do not use it blindly in your own application.)

HtmlDocument.Load() 方法还接受一个流,这在与 .NET 框架中的其他面向流的类集成时非常有用.HtmlEntity.DeEntitize() 是另一种正确处理 html 实体的有用方法.(感谢马修)

The HtmlDocument.Load() method also accepts a stream which is very useful in integrating with other stream oriented classes in the .NET framework. While HtmlEntity.DeEntitize() is another useful method for processing html entities correctly. (thanks Matthew)

HtmlDocumentHtmlNode 是您最常用的类.与 XML 解析器类似,它提供了接受 XPath 表达式的 selectSingleNode 和 selectNodes 方法.

HtmlDocument and HtmlNode are the classes you'll use most. Similar to an XML parser, it provides the selectSingleNode and selectNodes methods that accept XPath expressions.

注意 HtmlDocument.Option?????? 布尔属性.这些控制 LoadLoadXML 方法将如何处理您的 HTML/XHTML.

Pay attention to the HtmlDocument.Option?????? boolean properties. These control how the Load and LoadXML methods will process your HTML/XHTML.

还有一个名为 HtmlAgilityPack.chm 的编译帮助文件,其中包含每个对象的完整参考.这通常位于解决方案的基本文件夹中.

There is also a compiled help file called HtmlAgilityPack.chm that has a complete reference for each of the objects. This is normally in the base folder of the solution.

这篇关于如何使用 HTML 敏捷包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆