如何将UTF-8字符串转换成统一code转换？ [英] How to convert a UTF-8 string into Unicode?

查看：142 发布时间：2016/8/26 20:51:13 c# string unicode utf-8

本文介绍了如何将UTF-8字符串转换成统一code转换？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个显示UTF-8 EN codeD字符的字符串，我想将其转换回统一code。

I have string that displays UTF-8 encoded characters, and I want to convert it back to Unicode.

现在，我的实现如下：

public static string DecodeFromUtf8(this string utf8String)
{
    // read the string as UTF-8 bytes.
    byte[] encodedBytes = Encoding.UTF8.GetBytes(utf8String);

    // convert them into unicode bytes.
    byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, encodedBytes);

    // builds the converted string.
    return Encoding.Unicode.GetString(encodedBytes);
}

我玩字似曾相识。我曾经使用过此在线工具将其转换为UTF-8，所以我开始来测试我的方法与字符串DA©JA。

I am playing with the word "déjà". I have converted it into UTF-8 through this online tool, and so I started to test my method with the string "dÃ©jÃ".

不幸的是，在本实施字符串只是保持相同。

Unfortunately, with this implementation the string just remains the same.

我在哪里错了？

推荐答案

所以，问题是，UTF-8 code单元值已被存储为16位code单位在C＃中的序列字符串。您只需验证每个code单位是字节的范围内，这些值复制到字节，然后转换成新的UTF-8字节序列为UTF-16。

So the issue is that UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string. You simply need to verify that each code unit is within the range of a byte, copy those values into bytes, and then convert the new UTF-8 byte sequence into UTF-16.

public static string DecodeFromUtf8(this string utf8String)
{
    // copy the string as UTF-8 bytes.
    byte[] utf8Bytes = new byte[utf8String.Length];
    for (int i=0;i<utf8String.Length;++i) {
        //Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
        utf8Bytes[i] = (byte)utf8String[i];
    }

    return Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);
}

DecodeFromUtf8("d\u00C3\u00A9j\u00C3\u00A0"); // déjà

这是容易的，但是这将是最好找的根本原因;如果有人正在复制UTF-8 code单位为16位code单位的位置。可能的罪魁祸首是谁字节转换成C＃字符串使用了错误的编码。例如。 Encoding.Default.GetString（utf8Bytes，0，utf8Bytes.Length）。

This is easy, however it would be best to find the root cause; the location where someone is copying UTF-8 code units into 16 bit code units. The likely culprit is somebody converting bytes into a C# string using the wrong encoding. E.g. Encoding.Default.GetString(utf8Bytes, 0, utf8Bytes.Length).

这篇关于如何将UTF-8字符串转换成统一code转换？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将UTF-8字符串转换成统一code转换？ [英] How to convert a UTF-8 string into Unicode?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

如何将UTF-8字符串转换成统一code转换？ [英] How to convert a UTF-8 string into Unicode?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭