HTML敏捷包解析网站编码iso-8859-1确实令人讨厌 [英] Html agility pack parsing Website encoding iso-8859-1 REALLY ANNOYING
问题描述
我一直在使用HTML敏捷包为我的Windows Phone应用解析此网站;
I have been parsing this website for my windows phone app using Html agility pack;
首先,我使用webclient类下载它,然后提供HtmlDocument的结果.
First I download it using webclient class and then give the result for HtmlDocument.
iso-8859-1编码存在一些问题,但存在htmlentity.DeEntitize解决了字母Öä显示为& Ouml和& auml ...的问题...
There was some problems with iso-8859-1 encoding but htmlentity.DeEntitize solved problems with letters Ö ä showing as Ö and ä...
但是该文档仍然具有一些随机编码的斯堪的纳维亚字符(äö)(显示为: ).
But the document still has some scandinavian characters (äö) in some random encoding (which are showed as: �).
这些字母使用chrome可以完美显示.
Those letters show perfectly using chrome.
站点是: http://reittiopas.tampere.fi/mobile/fi/
推荐答案
Windows Phone仅支持少量编码,而iso-8859-1不是其中的一种!
Windows Phone only support a small set of encodings, and iso-8859-1 is not one of them!
要解决此问题,只需使用 Silverlight编码创建编码处理程序生成器,转换文本,然后像现在一样使用HTML Agility Pack!
To solve this, just create the encoding handler with Silverlight Encoding Generator, convert the text, and then use HTML Agility Pack as you are now!
这篇关于HTML敏捷包解析网站编码iso-8859-1确实令人讨厌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!