C#UTF8输出使编码字符保持完整 [英] C# UTF8 output keep encoded characters intact

查看:72
本文介绍了C#UTF8输出使编码字符保持完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的问题,我似乎无法理解.

i have a very simple question I can't seem to get my head around.

我有一个正确编码的UTF8字符串,我使用Json.NET解析为一个JObject,在其中弄乱了一些值并将其写入命令行,以保持编码后的字符完好无损.

I have a properly encoded UTF8-String I parse into a JObject with Json.NET, fiddle around with some values and write it to the commandline, keeping the encoded characters intact.

除保持编码字符完整无缺外,其他所有操作都很好.

Everything works great except for the keeping the encoded characters intact part.

代码:

var json = "{roster: [[\"Tulg\u00f4r\", 990, 1055]]}";
var j = JObject.Parse(json);
for (int i = 0; i < j["roster"].Count(); i++)
{
    j["roster"][i][1] = ((int)j["roster"][i][1]) * 3;
    j["roster"][i][2] = ((int)j["roster"][i][2]) * 3;
}
Console.WriteLine(JsonConvert.SerializeObject(j, Formatting.None));

实际输出:

{"roster":[["Tulgôr",2970,3165]]}

所需的输出:

{"roster":[["Tulg\u00f4r",2970,3165]]}

由于没有有用的东西出现,因此我在Google中的措辞似乎不合适.我敢肯定这很简单,事后我会觉得很愚蠢. :)

It seems like my phrasing in Google is inappropriate since nothing useful came up. I'm sure it's something uber-easy and i will feel pretty stupid afterwards. :)

推荐答案

从JsonConvert.SerializeObject中获取输出,并通过帮助程序方法运行该帮助程序方法,该方法会将所有非ASCII字符转换为它们的转义("\ uHHHH")等效字符.下面是一个示例实现.

Take the output from JsonConvert.SerializeObject and run it through a helper method that converts all non-ASCII characters to their escaped ("\uHHHH") equivalent. A sample implementation is given below.

// Replaces non-ASCII with escape sequences;
// i.e., converts "Tulgôr" to "Tulg\u00f4r".
private static string EscapeUnicode(string input)
{
    StringBuilder sb = new StringBuilder(input.Length);
    foreach (char ch in input)
    {
        if (ch <= 0x7f)
            sb.Append(ch);
        else
            sb.AppendFormat(CultureInfo.InvariantCulture, "\\u{0:x4}", (int) ch);
    }
    return sb.ToString();
}

您可以这样称呼它:

Console.WriteLine(EscapeUnicode(JsonConvert.SerializeObject(j, Formatting.None)));

(请注意,我不专门处理非BMP字符,因为我不知道您的第三方应用程序在表示U时是否需要"\ U00010000"或"\ uD800 \ uDC00"(或其他!). +10000.)

(Note that I don't handle non-BMP characters specially, because I don't know if your third-party application wants "\U00010000" or "\uD800\uDC00" (or something else!) when representing U+10000.)

这篇关于C#UTF8输出使编码字符保持完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆