如何在C#中获得HTML编码? [英] How to get the HTML encoding right in C#?
问题描述
我试图从网络字典中获取某个词的发音。例如,在下面的代码中,我想从 http获取 good
的发音://collinsdictionary.com
( HTTP敏捷包
用于此处)
static void test()
{
String url =http://www.collinsdictionary.com/dictionary/英语/好;
WebClient客户端=新WebClient();
client.Encoding = System.Text.Encoding.UTF8;
String html = client.DownloadString(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
HtmlAgilityPack.HtmlNode node = doc.DocumentNode.SelectSingleNode(// * [@ id = \good_1 \] / div [1] / h2 / span / text()[1]);
if(node == null)
{
Console.WriteLine(XPath not found。);
}
else
{
Console.WriteLine(node.WriteTo());
}
}
我在期待
& nbsp;(ɡʊd
但我最多可以得到的是
& nbsp;(g?d
$ c $ >问题不在于解析文本,而在于控制台输出问题,如果您是通过命令行应用程序执行此操作,则可以将控制台的输出编码设置为unicode:
Console.OutputEncoding = System.Text.Encoding.Unicode;
您还需要确保控制台中的字体是支持unicode的字体。请参阅回答以获取更多信息。
I'm trying to get the pronunciation for certain word from a web dictionary. For example, in the following code, I want to get the pronunciation of good
from http://collinsdictionary.com
(HTTP Agility Pack
is used here)
static void test()
{
String url = "http://www.collinsdictionary.com/dictionary/english/good";
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.UTF8;
String html = client.DownloadString(url);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
HtmlAgilityPack.HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id=\"good_1\"]/div[1]/h2/span/text()[1]");
if (node == null)
{
Console.WriteLine("XPath not found.");
}
else
{
Console.WriteLine(node.WriteTo());
}
}
I was expecting
(ɡʊd
but what I could get at best is
(ɡ?d
How to get it right?
解决方案 The problem is not in your parsing of the text, rather it is a problem with the console output. If you are doing this from a command line app, you can set the output encoding of the console to be unicode:
Console.OutputEncoding = System.Text.Encoding.Unicode;
You need to also ensure that your font in the console is a font that has unicode support. See this answer for more info.
这篇关于如何在C#中获得HTML编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!