Encoding.UTF8.GetString 不考虑 Preamble/BOM [英] Encoding.UTF8.GetString doesn't take into account the Preamble/BOM

查看：22 发布时间：2021/12/26 13:29:56 .net unicode character-encoding byte-order-mark

本文介绍了Encoding.UTF8.GetString 不考虑 Preamble/BOM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 .NET 中，我尝试使用 Encoding.UTF8.GetString 方法，该方法接受一个字节数组并将其转换为 string.

In .NET, I'm trying to use Encoding.UTF8.GetString method, which takes a byte array and converts it to a string.

看起来这个方法忽略了BOM(字节顺序标记)，这可能是一个UTF8 字符串合法二进制表示的一部分，并将其作为字符.

It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.

我知道我可以根据需要使用 TextReader 来消化 BOM，但我认为 GetString 方法应该是某种可以缩短代码的宏.

I know I can use a TextReader to digest the BOM as needed, but I thought that the GetString method should be some kind of a macro that makes our code shorter.

我错过了什么吗?这是故意的吗?

Am I missing something? Is this like so intentionally?

这是一个复制代码:

static void Main(string[] args)
{
    string s1 = "abc";
    byte[] abcWithBom;
    using (var ms = new MemoryStream())
    using (var sw = new StreamWriter(ms, new UTF8Encoding(true)))
    {
        sw.Write(s1);
        sw.Flush();
        abcWithBom = ms.ToArray();
        Console.WriteLine(FormatArray(abcWithBom)); // ef, bb, bf, 61, 62, 63
    }

    byte[] abcWithoutBom;
    using (var ms = new MemoryStream())
    using (var sw = new StreamWriter(ms, new UTF8Encoding(false)))
    {
        sw.Write(s1);
        sw.Flush();
        abcWithoutBom = ms.ToArray();
        Console.WriteLine(FormatArray(abcWithoutBom)); // 61, 62, 63
    }

    var restore1 = Encoding.UTF8.GetString(abcWithoutBom);
    Console.WriteLine(restore1.Length); // 3
    Console.WriteLine(restore1); // abc

    var restore2 = Encoding.UTF8.GetString(abcWithBom);
    Console.WriteLine(restore2.Length); // 4 (!)
    Console.WriteLine(restore2); // ?abc
}

private static string FormatArray(byte[] bytes1)
{
    return string.Join(", ", from b in bytes1 select b.ToString("x"));
}

Encoding.UTF8.GetString 不考虑 Preamble/BOM [英] Encoding.UTF8.GetString doesn't take into account the Preamble/BOM

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

Encoding.UTF8.GetString 不考虑 Preamble/BOM [英] Encoding.UTF8.GetString doesn&#39;t take into account the Preamble/BOM

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

Encoding.UTF8.GetString 不考虑 Preamble/BOM [英] Encoding.UTF8.GetString doesn't take into account the Preamble/BOM

登录关闭