如何使用HTML敏捷包 [英] How to use HTML Agility pack

查看:271
本文介绍了如何使用HTML敏捷包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何使用 HTML敏捷性包

我的XHTML文档不是完全有效的。这就是为什么我要使用它。我如何使用它在我的项目?我的项目是在C#。


解决方案

  1. 下载并建立HTMLAgilityPack解决方案。



  2. 您的应用程序添加到参考
    HTMLAgilityPack.dll在HTMLAgilityPack \\调试(或Realease)\\ bin文件夹。


然后,作为一个例子:

  HtmlAgilityPack.HtmlDocument HTMLDOC =新HtmlAgilityPack.HtmlDocument();//有多种选择,根据需要设置
htmlDoc.OptionFixNestedTags = TRUE;// filepath是到一个文件中包含HTML路径
htmlDoc.Load(文件路径);//使用:htmlDoc.LoadHtml(的xmlString);从字符串加载(是htmlDoc.LoadXML(的xmlString)// ParseErrors是包含从加载语句中的任何错误,一个ArrayList
如果(htmlDoc.ParseErrors =空&放大器;!&放大器; htmlDoc.ParseErrors.Count()大于0)
{
    //处理任何解析错误的要求}
其他
{    如果(htmlDoc.DocumentNode!= NULL)
    {
        HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode(//体);        如果(bodyNode!= NULL)
        {
            //使用bodyNode东西
        }
    }
}

(注意:这code是一个例子只是不一定是最好的/唯一的办法,不要在自己的应用程序盲目使用它)

HtmlDocument.Load()方法还接受流,这与在.NET框架的其他面向流类整合非常有用的。而 HtmlEntity.DeEntitize()是正确处理HTML实体另一种有用的方法。 (感谢马修)

的HTMLDocument HtmlNode 是你最常使用的类。以XML解析器一样,它提供了接受XPath的前pressions的的selectSingleNode和的selectNodes方法。

,请注意 HtmlDocument.Option ?????? 布尔属性。这些控制如何加载的loadXML 方法将处理您的HTML / XHTML。

此外,还有对每个对象的完整参考称为HtmlAgilityPack.chm编译的帮助文件。这通常是在溶液中的基体夹

How do I use the HTML Agility Pack?

My XHTML document is not completely valid. That's why I wanted to use it. How do I use it in my project? My project is in C#.

解决方案

  1. Download and build the HTMLAgilityPack solution.

  2. In your application, add a reference to HTMLAgilityPack.dll in the HTMLAgilityPack\Debug (or Realease) \bin folder.

Then, as an example:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;

// filePath is a path to a file containing the html
htmlDoc.Load(filePath);

// Use:  htmlDoc.LoadHtml(xmlString);  to load from a string (was htmlDoc.LoadXML(xmlString)

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
    // Handle any parse errors as required

}
else
{

    if (htmlDoc.DocumentNode != null)
    {
        HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

(NB: This code is an example only and not necessarily the best/only approach. Do not use it blindly in your own application.)

The HtmlDocument.Load() method also accepts a stream which is very useful in integrating with other stream oriented classes in the .NET framework. While HtmlEntity.DeEntitize() is another useful method for processing html entities correctly. (thanks Matthew)

HtmlDocument and HtmlNode are the classes you'll use most. Similar to an XML parser, it provides the selectSingleNode and selectNodes methods that accept XPath expressions.

Pay attention to the HtmlDocument.Option?????? boolean properties. These control how the Load and LoadXML methods will process your HTML/XHTML.

There is also a compiled help file called HtmlAgilityPack.chm that has a complete reference for each of the objects. This is normally in the base folder of the solution.

这篇关于如何使用HTML敏捷包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆