提高:: property_tree :: json_parser和两个字节宽字符 [英] boost::property_tree::json_parser and two-byte wide characters

查看:305
本文介绍了提高:: property_tree :: json_parser和两个字节宽字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

std::string text = "á";

一是双字节字符(假设为UTF-8编码)。结果
所以,下面的行打印2

"á" is two-byte character (assuming a UTF-8 encoding).
So following line prints 2.

std::cout << text.size() << "\n";

的std :: COUT 仍然正确打印文本。

std::cout << text << "\n";

我的问题

我通过文本的boost :: property_tree :: ptree中再到 write_json

boost::property_tree::ptree root;
root.put<std::string>("text", text);

std::stringstream ss;
boost::property_tree::json_parser::write_json(ss, root);
std::cout << ss.str() << "\n";

的结果是

{
    "text": "\u00C3\u00A1"
}

正文等于á,这是比A不同。

text is equal to "á" which is different than "á".

时,可以无需切换到的std :: wstring的来解决这个问题?是否有可能,改变库(的boost :: property_tree :: ptree中)可以解决这个问题?

Is is possible to fix this problem without switching to std::wstring? Is it possible that changing library (boost::property_tree::ptree) can solve this problem ?

推荐答案

我找到了一些解决方案。
一般来说,你需要指定的boost :: property_tree :: json_parser :: create_escapes 模板 [CH =字符] ,为你的特殊的场合没有缺陷逃逸。

I found some solutions. In general you needs to specify boost::property_tree::json_parser::create_escapes template for [Ch=Char], to provide your "special occasion bug free escaping".

JSON标准假设所有的字符串是UTF-16带为\\ uXXXX转义codeD,但一些库的支持UTF-8编码用\\ XXX逃逸。如果JSON文件可以在连接UTF-8 $ C $的CD,也可以传递比0x7F的更高的所有的字符,女巫的目的是为原有的功能。

JSON standard assume that all string are UTF-16 encoded with "\uXXXX" escaping, but some library support UTF-8 encoding with "\xXX" escaping. If JSON file can be encoded in UTF-8, you may pass all character higher than 0x7F, witch was intended for original function.

我把这个code。使用的boost :: property_tree :: json_parser :: write_json 之前。它来自 boost_1_49_0 /升压/ property_tree /细节/ json_parser_write.hpp

I put this code before using boost::property_tree::json_parser::write_json. It comes from boost_1_49_0/boost/property_tree/detail/json_parser_write.hpp:

namespace boost { namespace property_tree { namespace json_parser
{
    // Create necessary escape sequences from illegal characters
    template<>
    std::basic_string<char> create_escapes(const std::basic_string<char> &s)
    {
        std::basic_string<char> result;
        std::basic_string<char>::const_iterator b = s.begin();
        std::basic_string<char>::const_iterator e = s.end();
        while (b != e)
        {
            // This assumes an ASCII superset. But so does everything in PTree.
            // We escape everything outside ASCII, because this code can't
            // handle high unicode characters.
            if (*b == 0x20 || *b == 0x21 || (*b >= 0x23 && *b <= 0x2E) ||
                (*b >= 0x30 && *b <= 0x5B) || (*b >= 0x5D && *b <= 0xFF)  //it fails here because char are signed
                || (*b >= -0x80 && *b < 0 ) ) // this will pass UTF-8 signed chars
                result += *b;
            else if (*b == char('\b')) result += char('\\'), result += char('b');
            else if (*b == char('\f')) result += char('\\'), result += char('f');
            else if (*b == char('\n')) result += char('\\'), result += char('n');
            else if (*b == char('\r')) result += char('\\'), result += char('r');
            else if (*b == char('/')) result += char('\\'), result += char('/');
            else if (*b == char('"'))  result += char('\\'), result += char('"');
            else if (*b == char('\\')) result += char('\\'), result += char('\\');
            else
            {
                const char *hexdigits = "0123456789ABCDEF";
                typedef make_unsigned<char>::type UCh;
                unsigned long u = (std::min)(static_cast<unsigned long>(
                                                 static_cast<UCh>(*b)),
                                             0xFFFFul);
                int d1 = u / 4096; u -= d1 * 4096;
                int d2 = u / 256; u -= d2 * 256;
                int d3 = u / 16; u -= d3 * 16;
                int d4 = u;
                result += char('\\'); result += char('u');
                result += char(hexdigits[d1]); result += char(hexdigits[d2]);
                result += char(hexdigits[d3]); result += char(hexdigits[d4]);
            }
            ++b;
        }
        return result;
    }
} } }

和输出我得到:

{
    "text": "aáb"
}

另外,功能的boost :: property_tree :: json_parser :: a_uni code 有阅读类似的问题逃脱UNI code字符签署字符。

Also the function boost::property_tree::json_parser::a_unicode have similar problems with reading escaped unicode characters to signed chars.

这篇关于提高:: property_tree :: json_parser和两个字节宽字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆