HtmlAgilityPack WebGet.Load给出错误"未将对象引用设置到对象"的实例; [英] HtmlAgilityPack WebGet.Load gives error "Object reference not set to an instance of an object"
问题描述
我在一个有关获取新车的价格从经销商网站项目。我可以读取大多数网站的HTML。但是,当我尝试加载其中之一WebGet.Load(URL)方法给出对象引用未设置到对象的实例。
错误。我无法找到这些网站之间的差异
I am on a project about getting new car prices from dealers websites. I can fetch most web sites html. But when I try to load one of them WebGet.Load(url) method gives Object reference not set to an instance of an object.
error. I couldn't find any differences between these web sites.
正常工作URL例子:
http://www.renault.com.tr/page.aspx?id=1715
http://www.hyundai.com.tr/tr/Content.aspx?id=fiyatlistesi
网站存在问题:
http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx
感谢您的帮助。
var webGet = new HtmlWeb();
var document = webGet.Load("http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx");
当我使用未加载此URL文件。
When I use this url document is not loaded.
推荐答案
实际的问题是HtmlAgilityPack内部。不正常的网页有这个荟萃内容类型:< META HTTP的当量=Content-Type的CONTENT =text / html的;字符集= 8859-9>
其中,字符集= 8859-9
似乎incorrent。哈尔内部试图通过使用类似 Encoding.GetEncoding(8859-9)
来得到该字符串的相应的编码,这将引发一个错误(我认为实际的编码应 ISO-8859-9
)。
The actual problem is in HtmlAgilityPack internals. The page not working has this meta content type: <META http-equiv="Content-Type" content="text/html; charset=8859-9">
where charset=8859-9
seems to be incorrent. The HAL internals tries to get an appropriate encoding for this string by using something like Encoding.GetEncoding("8859-9")
and this throws an error (I think the actual encoding should be iso-8859-9
).
其实你需要的是告诉HAL不读编码为的HTMLDocument
(只 HtmlDocument.OptionReadEncoding = TRUE
),但这似乎是不可能的 HtmlWeb.Load
(设置 HtmlWeb.AutoDetectEncoding
不在这里工作)。因此,解决方法可能是在url(最简单的方法)的手动阅读:
Actually all you need is to tell the HAL not to read encoding for the HtmlDocument
(just HtmlDocument.OptionReadEncoding = true
), but this seems to be impossible with HtmlWeb.Load
(setting HtmlWeb.AutoDetectEncoding
isn't work here). So, the workaround could be in a manual reading of the url (the simplest way):
var document = new HtmlDocument();
document.OptionReadEncoding = false;
var url =
new Uri("http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx");
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
document.Load(stream, Encoding.GetEncoding("iso-8859-9"));
}
}
这工作,并成功地解析页面。
This works, and successfully parses the page.
编辑: @:西蒙Mourier:是的,它提出了的NullReferenceException
,因为它抓住的ArgumentException
并设置 _declaredencoding = NULL
那里。然后 _declaredencoding.WindowsCodePage
行抛出空引用。
@:Simon Mourier: yes, it raises NullReferenceException
because it catches ArgumentException
and sets _declaredencoding = null
there. And then _declaredencoding.WindowsCodePage
line throws the null reference.
下面是从HtmlDocument.cs一个代码块, ReadDocumentEncoding
方法:
here is a code block from the HtmlDocument.cs, ReadDocumentEncoding
method:
try
{
_declaredencoding = Encoding.GetEncoding(charset);
}
catch (ArgumentException)
{
_declaredencoding = null;
}
if (_onlyDetectEncoding)
{
throw new EncodingFoundException(_declaredencoding);
}
if (_streamencoding != null)
{
if (_declaredencoding.WindowsCodePage != _streamencoding.WindowsCodePage)
{
AddError(
HtmlParseErrorCode.CharsetMismatch,
_line, _lineposition,
_index, node.OuterHtml,
"Encoding mismatch between StreamEncoding: " +
_streamencoding.WebName + " and DeclaredEncoding: " +
_declaredencoding.WebName);
}
}
这是我的堆栈跟踪:
And here is my stack trace:
System.NullReferenceException was unhandled
Message=Object reference not set to an instance of an object.
Source=HtmlAgilityPack
StackTrace:
at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916
at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805
at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468
at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769
at HtmlAgilityPack.HtmlDocument.Load(Stream stream, Boolean detectEncodingFromByteOrderMarks) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 597
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152
at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
at test.console.Program.Main(String[] args) in W:\Projects\Me\test.console\test.console\Program.cs:line 54
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
这篇关于HtmlAgilityPack WebGet.Load给出错误"未将对象引用设置到对象"的实例;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!