如何在 C# 中将字符串转换为 UTF-8? [英] How can I transform string to UTF-8 in C#?
问题描述
我有一个从第三方应用程序收到的字符串,我想在我的 Windows Surface 上使用 C# 以任何语言正确显示它.
I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
由于编码不正确,我的一段字符串在西班牙语中看起来像这样:
Due to incorrect encoding, a piece of my string looks like this in Spanish:
Acción
而它应该是这样的:
动作
根据这个问题的答案:如何知道 C# 中的字符串编码,我收到的编码应该已经是 UTF-8,但它是在 Encoding.Default(可能是 ANSI?)上读取的.
According to the answer on this question: How to know string encoding in C#, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).
我正在尝试将此字符串转换为真正的 UTF-8,但问题之一是我只能看到 Encoding 类的一个子集(仅限 UTF8 和 Unicode 属性),可能是因为我仅限于 windows表面 API.
I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.
我尝试了一些我在互联网上找到的片段,但到目前为止,对于东方语言(即韩语),没有一个被证明是成功的.一个例子如下:
I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:
var utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(myString);
myString= utf8.GetString(utfBytes, 0, utfBytes.Length);
我还尝试将字符串提取到字节数组中,然后使用 UTF8.GetString:
I also tried extracting the string into a byte array and then using UTF8.GetString:
byte[] myByteArray = new byte[myString.Length];
for (int ix = 0; ix < myString.Length; ++ix)
{
char ch = myString[ix];
myByteArray[ix] = (byte) ch;
}
myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);
你们还有什么我可以尝试的想法吗?
Do you guys have any other ideas that I could try?
推荐答案
如您所知,字符串以 Encoding.Default
的形式出现,您可以简单地使用:
As you know the string is coming in as Encoding.Default
you could simply use:
byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);
还有一点你可能要记住:如果你使用Console.WriteLine来输出一些字符串,那么你还应该写Console.OutputEncoding = System.Text.Encoding.UTF8;
!!!或者所有的utf8字符串都会输出为gbk...
Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;
!!! Or all utf8 strings will be outputed as gbk...
这篇关于如何在 C# 中将字符串转换为 UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!