C#和HtmlAgilityPack编码问题 [英] C# and HtmlAgilityPack encoding problem

查看:147
本文介绍了C#和HtmlAgilityPack编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  WebClient GodLikeClient = new WebClient(); 
HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument();

GodLikeHTML.Load(GodLikeClient.OpenRead(www.alfa.lt);


$ b $所以这个代码返回:Skaitytojo klausimas psichologui:kas lemiahomoseksualumÄ...?Naujienųportalas Alfa.lt而不是Skaitytojo klausimas psichologui:kas lemiahomoseksualumą? - Naujienųportalas Alfa.lt。



此网页编码为1257(波罗的海),但 textBox1.Text = GodLikeHTML.DocumentNode.OuterHtml; 返回扭曲的文本 - 波罗的海变音符号变成一些奇怪的几个字符长串:(



是的,我已经尝试过HtmlAgilityPack论坛,他们做的很糟糕。



PS我不是程序员,但是我在一个社区项目上工作,我真的需要得到这个代码。谢谢;}

  GodLikeHTML.Load(GodLikeClient.OpenRead (http://www.alfa.lt),Encoding.UTF8); 

将工作。



或者您可以使用我的 SO答案,它检测来自http标头或元标签的编码,重新编码。 (它还支持gzip以最小化您的下载)。



下载类的代码如下所示:

  HttpDownloader downloader = new HttpDownloader(http://www.alfa.lt,null,null); 
GodLikeHTML.LoadHtml(downloader.GetPage());


WebClient GodLikeClient = new WebClient();
HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument();

GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt");

So this code returns: "Skaitytojo klausimas psichologui: kas lemia homoseksualumÄ…? - Naujienų portalas Alfa.lt" instead of "Skaitytojo klausimas psichologui: kas lemia homoseksualumą? - Naujienų portalas Alfa.lt".

This webpage is encoded in 1257 (baltic), but textBox1.Text = GodLikeHTML.DocumentNode.OuterHtml; returns the distorted text - baltic diacritics are transformed into some weird several characters long strings :(

And yes, I've tried the HtmlAgilityPack forums. They do suck.

P.S. I'm no programmer, but I work on a community project and I really need to get this code working. Thanks ;}

解决方案

Actually the page is encoded with UTF-8.

GodLikeHTML.Load(GodLikeClient.OpenRead("http://www.alfa.lt"), Encoding.UTF8);

will work.

Or you could use the code in my SO answer which detects encoding from http headers or meta tags, en re-encodes properly. (It also supports gzip to minimize your download).

With the download class your code would look like:

HttpDownloader downloader = new HttpDownloader("http://www.alfa.lt",null,null);
GodLikeHTML.LoadHtml(downloader.GetPage());

这篇关于C#和HtmlAgilityPack编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆