序列化和反序列化字符 [英] Serialize and deserialize char(s)

查看:83
本文介绍了序列化和反序列化字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我班上有一个字符列表.序列化和反序列化按预期工作.如果我的列表包含哪个字符,则需要描述字节顺序标记.示例字符代码为56256.因此,如下所示,创建了一个简单的测试.

i have a list of chars on my class. Serialization and deserialization are works as expected. If my list contains which char is need to describe byte order mark. Example char code is 56256. So, created simple test to as this question is below.

[Test]
public void Utf8CharSerializeAndDeserializeShouldEqual()
{
    UInt16 charCode = 56256;
    char utfChar = (char)charCode;
    using (MemoryStream ms = new MemoryStream())
    {
        using (StreamWriter writer = new StreamWriter(ms, Encoding.UTF8, 1024, true))
        {
            var serializer = new JsonSerializer();
            serializer.Serialize(writer, utfChar);
        }

        ms.Position = 0;
        using (StreamReader reader = new StreamReader(ms, true))
        {
            using (JsonTextReader jsonReader = new JsonTextReader(reader))
            { 
                var serializer = new JsonSerializer();
                char deserializedChar = serializer.Deserialize<char>(jsonReader);

                Console.WriteLine($"{(int)utfChar}, {(int)deserializedChar}");
                Assert.AreEqual(utfChar, deserializedChar);
                Assert.AreEqual((int)utfChar, (int)deserializedChar);
            }
        }
    }
}

不需要字符代码的BOM时,测试效果很好.例如65(A)将通过此测试.

Test works as fine when char code is not needed a BOM. For example 65(A) will pass this test.

推荐答案

您的问题与Json.NET无关.您的问题是 U+DBC0(十进制56256)是无效的unicode字符,并且如文档中所述,您的StreamWriter使用的Encoding.UTF8不会对此类字符进行编码:

Your problem is unrelated to Json.NET. Your problem is that U+DBC0 (decimal 56256) is an invalid unicode character, and, as explained in the documentation, the Encoding.UTF8 used by your StreamWriter will not encode such a character:

Encoding.UTF8返回一个UTF8Encoding对象,该对象使用替换后备替换以问号(?")字符替换它不能编码的每个字符串和不能解码的每个字节.

Encoding.UTF8 returns a UTF8Encoding object that uses replacement fallback to replace each string that it can't encode and each byte that it can't decode with a question mark ("?") character.

要确认这一点,如果您将Encoding.UTF8替换为 new UTF8Encoding(true, true) 在您的测试示例中,您将收到以下异常:

To confirm this, if you replace Encoding.UTF8 with new UTF8Encoding(true, true) in your test example, you will get the following exception:

EncoderFallbackException: Unable to translate Unicode character \uDBC0 at index 1 to specified code page. 

如果您要尝试序列化无效的Unicode char值,则需要使用以下命令将它们手动编码为例如字节数组:

If you are going to try to serialize invalid Unicode char values, you're going to need to manually encode them as, e.g., a byte array using the following:

public static partial class TextExtensions
{
    static void ToBytesWithoutEncoding(char c, out byte lower, out byte upper)
    {
        var u = (uint)c;
        lower = unchecked((byte)u);
        upper = unchecked((byte)(u >> 8));
    }

    public static byte[] ToByteArrayWithoutEncoding(this char c)
    {
        byte lower, upper;
        ToBytesWithoutEncoding(c, out lower, out upper);
        return new byte[] { lower, upper };
    }

    public static byte[] ToByteArrayWithoutEncoding(this ICollection<char> list)
    {
        if (list == null)
            return null;
        var bytes = new byte[checked(list.Count * 2)];
        int to = 0;
        foreach (var c in list)
        {
            ToBytesWithoutEncoding(c, out bytes[to], out bytes[to + 1]);
            to += 2;
        }
        return bytes;
    }

    public static char ToCharWithoutEncoding(this byte[] bytes)
    {
        return bytes.ToCharWithoutEncoding(0);
    }

    public static char ToCharWithoutEncoding(this byte[] bytes, int position)
    {
        if (bytes == null)
            return default(char);
        char c = default(char);
        if (position < bytes.Length)
            c += (char)bytes[position];
        if (position + 1 < bytes.Length)
            c += (char)((uint)bytes[position + 1] << 8);
        return c;
    }

    public static List<char> ToCharListWithoutEncoding(this byte[] bytes)
    {
        if (bytes == null)
            return null;
        var chars = new List<char>(bytes.Length / 2 + bytes.Length % 2);
        for (int from = 0; from < bytes.Length; from += 2)
        {
            chars.Add(bytes.ToCharWithoutEncoding(from));
        }
        return chars;
    }
}

然后按如下所示修改您的测试方法:

Then modify your test method as follows:

    public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed()
    {
        Utf8JsonCharSerializeAndDeserializeShouldEqualFixed((char)56256);
    }

    public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed(char utfChar)
    {
        byte[] data;

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter writer = new StreamWriter(ms, new UTF8Encoding(true, true), 1024))
            {
                var serializer = new JsonSerializer();
                serializer.Serialize(writer, utfChar.ToByteArrayWithoutEncoding());
            }
            data = ms.ToArray();
        }

        using (MemoryStream ms = new MemoryStream(data))
        {
            using (StreamReader reader = new StreamReader(ms, true))
            {
                using (JsonTextReader jsonReader = new JsonTextReader(reader))
                {
                    var serializer = new JsonSerializer();
                    char deserializedChar = serializer.Deserialize<byte[]>(jsonReader).ToCharWithoutEncoding();

                    //Console.WriteLine(string.Format("{0}, {1}", utfChar, deserializedChar));
                    Assert.AreEqual(utfChar, deserializedChar);
                    Assert.AreEqual((int)utfChar, (int)deserializedChar);
                }
            }
        }
    }

或者,如果您在某些容器类中具有List<char>属性,则可以创建以下转换器:

Or, if you have a List<char> property in some container class, you can create the following converter:

public class CharListConverter : JsonConverter
{
    public override bool CanConvert(Type objectType)
    {
        return objectType == typeof(List<char>);
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        if (reader.TokenType == JsonToken.Null)
            return null;
        var bytes = serializer.Deserialize<byte[]>(reader);
        return bytes.ToCharListWithoutEncoding();
    }

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        var list = (ICollection<char>)value;
        var bytes = list.ToByteArrayWithoutEncoding();
        serializer.Serialize(writer, bytes);
    }
}

并按如下所示应用它:

public class RootObject
{
    [JsonConverter(typeof(CharListConverter))]
    public List<char> Characters { get; set; }
}

在两种情况下,Json.NET都会将字节数组编码为Base64.

In both cases Json.NET will encode the byte array as Base64.

这篇关于序列化和反序列化字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆