HtmlAgilityPack和HtmlDecode [英] HtmlAgilityPack and HtmlDecode
问题描述
我目前正在将HtmlAgilityPack与控制台应用程序配合使用来抓取网站.由于html是经过编码的(返回的是'
之类的编码字符),因此必须先进行解码,然后再将内容保存到数据库中.
I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like '
) I have to decode before I save the content to my database.
有没有一种方法可以使用HtmlAgilityPack解码返回的html,而不必使用HttpUtility.HtmlDecode?如果可能的话,我想避免将System.Web添加到我的控制台应用程序中.
Is there a way to decode the returned html using HtmlAgilityPack without having to use HttpUtility.HtmlDecode? I want to avoid adding System.Web to my console application if possible.
推荐答案
HTML Agility Pack配备了一个名为HtmlEntity
的实用程序类.它具有带有以下签名的静态方法:
The Html Agility Pack is equiped with a utility class called HtmlEntity
. It has a static method with the following signature:
/// <summary>
/// Replace known entities by characters.
/// </summary>
/// <param name="text">The source text.</param>
/// <returns>The result text.</returns>
public static string DeEntitize(string text)
它支持众所周知的实体(例如
)和编码字符(例如'
).
It supports well-known entities (like
) and encoded characters such as '
as well.
这篇关于HtmlAgilityPack和HtmlDecode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!