如何在C#中获得HTML编码? [英] How to get the HTML encoding right in C#?

查看:94
本文介绍了如何在C#中获得HTML编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从网络字典中获取某个词的发音。例如,在下面的代码中,我想从 http获取 good 的发音://collinsdictionary.com



HTTP敏捷包用于此处)

  static void test()
{
String url =http://www.collinsdictionary.com/dictionary/英语/好;
WebClient客户端=新WebClient();
client.Encoding = System.Text.Encoding.UTF8;
String html = client.DownloadString(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
HtmlAgilityPack.HtmlNode node = doc.DocumentNode.SelectSingleNode(// * [@ id = \good_1 \] / div [1] / h2 / span / text()[1]);
if(node == null)
{
Console.WriteLine(XPath not found。);
}
else
{
Console.WriteLine(node.WriteTo());
}
}

我在期待

 & nbsp;(ɡʊd

但我最多可以得到的是

 & nbsp;(g?d 

>问题不在于解析文本,而在于控制台输出问题,如果您是通过命令行应用程序执行此操作,则可以将控制台的输出编码设置为unicode:

  Console.OutputEncoding = System.Text.Encoding.Unicode; 

您还需要确保控制台中的字体是支持unicode的字体。请参阅回答以获取更多信息。


I'm trying to get the pronunciation for certain word from a web dictionary. For example, in the following code, I want to get the pronunciation of good from http://collinsdictionary.com

(HTTP Agility Pack is used here)

static void test()
{
    String url = "http://www.collinsdictionary.com/dictionary/english/good";
    WebClient client = new WebClient();
    client.Encoding = System.Text.Encoding.UTF8;
    String html = client.DownloadString(url);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    HtmlAgilityPack.HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id=\"good_1\"]/div[1]/h2/span/text()[1]");
    if (node == null)
    {
        Console.WriteLine("XPath not found.");
    }
    else
    {
        Console.WriteLine(node.WriteTo());
    }
}

I was expecting

 (ɡʊd

but what I could get at best is

 (ɡ?d

How to get it right?

解决方案

The problem is not in your parsing of the text, rather it is a problem with the console output. If you are doing this from a command line app, you can set the output encoding of the console to be unicode:

Console.OutputEncoding = System.Text.Encoding.Unicode;

You need to also ensure that your font in the console is a font that has unicode support. See this answer for more info.

这篇关于如何在C#中获得HTML编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆