HTML敏捷包解析网站编码iso-8859-1确实令人讨厌 [英] Html agility pack parsing Website encoding iso-8859-1 REALLY ANNOYING

查看:88
本文介绍了HTML敏捷包解析网站编码iso-8859-1确实令人讨厌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用HTML敏捷包为我的Windows Phone应用解析此网站;

I have been parsing this website for my windows phone app using Html agility pack;

首先,我使用webclient类下载它,然后提供HtmlDocument的结果.

First I download it using webclient class and then give the result for HtmlDocument.

iso-8859-1编码存在一些问题,但存在htmlentity.DeEntitize解决了字母Öä显示为& Ouml和& auml ...的问题...

There was some problems with iso-8859-1 encoding but htmlentity.DeEntitize solved problems with letters Ö ä showing as &Ouml and &auml...

但是该文档仍然具有一些随机编码的斯堪的纳维亚字符(äö)(显示为: ).

But the document still has some scandinavian characters (äö) in some random encoding (which are showed as: �).

这些字母使用chrome可以完美显示.

Those letters show perfectly using chrome.

站点是: http://reittiopas.tampere.fi/mobile/fi/

推荐答案

Windows Phone仅支持少量编码,而iso-8859-1不是其中的一种!

Windows Phone only support a small set of encodings, and iso-8859-1 is not one of them!

要解决此问题,只需使用 Silverlight编码创建编码处理程序生成器,转换文本,然后像现在一样使用HTML Agility Pack!

To solve this, just create the encoding handler with Silverlight Encoding Generator, convert the text, and then use HTML Agility Pack as you are now!

这篇关于HTML敏捷包解析网站编码iso-8859-1确实令人讨厌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆