如何知道字符串编码在C# [英] How to know string encoding in C#

查看:299
本文介绍了如何知道字符串编码在C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个第三方程序得到一个我不能控制的字符串。我的一段代码在HTML中输出。这在英语工作很好,但在其他语言,它将以一个有趣的方式显示。例如,西班牙语中的口音看起来很滑稽,东方语言(即韩语)中的字符看起来很有趣。我很确定我需要做一些编码工作,以便所有语言显示正确。

I am getting a string from a third party program that I don't control. My piece of the code outputs this in HTML. This works fine in English, but in other languages it will show in a funny way. For example, accents in Spanish look funny and characters in eastern languages (i.e. korean) will look very funny. I am pretty sure I need to do some encoding work so that all languages display correctly.

我对编码的理解是有点差,所以在发布真正的问题,我直觉地认为它是:我如何编码为UTF-8在C#,我想通过发布更简单的问题,得到更多的理解。

My understanding of encoding is kind of poor, so before posting the real question, which I intuitively think it is: "How do I encode this to UTF-8 in C#", I would like to get more understanding on the matter by posting simpler questions.

我的问题是:我如何知道我的输入字符串有哪种类型的编码?在西班牙语中,得到口音:Acción,而不是Acción。这是ANSI还是我要处理的?

My question here is: How do I know which type of encoding does my input string has? In Spanish, it looks like this when I get an accent: "Acción", instead of "Acción". Is this ANSI or what am I dealing with?

提前多谢!

推荐答案


我得到一个重音:Acción

I get an accent: "Acción"

是一个死的救恩。重要资本A字符具有字符代码0xC0及以上。它通常是一个双字节utf-8编码字符的第一个字节。 óglyph是代码点U + 00F3,它的utf-8编码是0xC3 + 0xB3。哪些是Ã和³的码点

The presence of the à character is a dead give-away. Accented capital A characters have character code 0xC0 and up. Which is often the first byte in a two-byte utf-8 encoded character. The ó glyph is codepoint U+00F3, the utf-8 encoding for it is 0xC3 + 0xB3. Which are the codepoints for à and ³

这些字符串以utf-8编码,但是您正在使用8位编码读取它,例如Encoding.Default

The strings are encoded in utf-8 but you are reading it with an 8-bit encoding like Encoding.Default

这篇关于如何知道字符串编码在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆