在NullReferenceException异常HtmlAgilityPack [英] NullReferenceException in HtmlAgilityPack

查看:186
本文介绍了在NullReferenceException异常HtmlAgilityPack的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取链接使用的XPath 从下面提到的网址

I am trying to extract a link using xpath from the below mentioned url

string url = "http://www.album-cover-art.org/search.php?q=Ruin+-+Live+Album+Version+Lamb+of+God"

我的代码:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc = web.Load(url); //Exception generated here Line 23

if (htmlDoc.DocumentNode != null)
{
  HtmlNode linkNode = htmlDoc.DocumentNode.SelectSingleNode(".//*[@id='related_search_row']/img/@src");
  if (linkNode != null)
        Console.WriteLine(linkNode.InnerText);
}



上面的代码编译罚款,但是当我尝试运行它会产生一个异常

The above code compiles fine but when I try to run it generates a exception

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.

完整的堆栈跟踪

System.NullReferenceException: Object reference not set to an instance of an object.
   at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916
   at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805
   at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468
   at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1149
   at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
   at ScreenScrapping.Program.Main(String[] args) in c:\Users\ranveer\csharp\ScreenScrapping\ScreenScrapping\Program.cs:line 23

所以,我的问题是为什么我收到这个异常。

So, my question is why I am getting this exception.

推荐答案

这是HtmlAgilityPack的错误。你试图解析该文件有< META HTTP-EQUIV =Content-Type的CONTENT =text / html的;字符集= ISO UTF-8> 其中字符集值( ISO UTF-8 )无法解析由AgilityPack有效的编码名称。由于西蒙Mourier ,的这是1.4.0.0

This is a bug in HtmlAgilityPack. The document you're trying to parse has <meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8"> where the charset value (iso-utf-8) couldn't be parsed as a valid encoding name by AgilityPack. As Simon Mourier said, this is a bug introduced in 1.4.0.0.

引入了错误为了避免这种情况,手动从流和集加载文档编码手动像这样的:

To avoid this, manually load your document from a stream and set the encoding manually like this:

var htmlDoc = new HtmlDocument();
htmlDoc.OptionReadEncoding = false;
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
    using (var stream = response.GetResponseStream())
    {
        htmlDoc.Load(stream, Encoding.UTF8);
    }
}

这篇关于在NullReferenceException异常HtmlAgilityPack的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆