C ++ 0x中的新unicode字符 [英] New unicode characters in C++0x

查看:171
本文介绍了C ++ 0x中的新unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个API,允许我以各种编码提取字符串,包括utf8,utf16,utf32和wchar_t(根据操作系统可能是utf32或utf16)。


  1. 新C ++标准引入了 char16_t char32_t 但是问题是,会干扰与正常的 uint16_t code>, uint32_t wchar_t 类型不允许重载,因为它们可能引用同一类型?

      class some_class {
    public:
    void set(std :: string); // utf8 string
    void set(std :: wstring); // wchar string utf16 or utf32根据
    //到sizeof(wchar_t)
    void set(std :: basic_string< uint16_t>)
    // wchar independent utf16 string
    void set (std :: basic_string< uint32_t>);
    // wchar independent utf32 string


    #ifdef HAVE_NEW_UNICODE_CHARRECTERS
    void set(std :: basic_string< char16_t>)
    // new standard utf16 string
    void set(std :: basic_string< char32_t>);
    // new standard utf32 string
    #endif
    };

    所以我可以写:

      foo.set(USome utf32 String); 
    foo.set(uSome utf16 string);


  2. std :: basic_string< char16_t& std :: basic_string< char32_t>

      typedef basic_string< wchar_t> wstring。 

    我找不到任何参考。



    Edit:根据gcc-4.4的标题,引入了这些新类型:

      typedef basic_string< char16_t> u16string; 
    typedef basic_string< char32_t> u32string;

    我只是想确保这是实际的标准要求,而不是gcc-ism。

  3. 1) char16_t char32_t 将是不同的新类型,因此可以对它们进行重载。



    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2018.html\">ISO/IEC JTC1 SC22 WG21 N2018 :


    定义 char16_t
    distinct新类型的typedef,名称为
    _Char16_t uint_least16_t 具有相同的大小和表示。
    同样,定义 char32_t
    typedef到一个不同的新类型,
    名称 _Char32_t
    uint_least32_t 具有相同的
    大小和表示。


    进一步说明(摘自devx.com文章为Unicode革命做准备 ):


    您可能想知道为什么
    _Char16_t uint_least16_t 时,需要code>和 _Char32_t
    uint_least32_t 已可用。
    新类型
    解决的主要问题是重载。现在
    可能重载
    采用 _Char16_t _Char32_t
    参数,并创建与
    std不同的专用化
    ,例如 std :: basic_string< _Char16_t>
    :basic_string< wchar_t>


    2) u16string u32string 确实是C ++ 0x的一部分,而不仅仅是GCC主题,因为它们在各种标准草案。它们将包含在新的< string> 头中。同一篇文章的引用:


    标准库还将提供
    _Char16_t _Char32_t typedefs,类似于typedefs wstring
    wcout 等,用于以下标准类:



    filebuf,streambuf,streampos,streamoff,ios,istream, ostream,fstream,
    ifstream,ofstream,stringstream,istringstream,ostringstream,
    string


    I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS).

    1. New C++ standard had introduced new types char16_t and char32_t that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but the question is, would they interfere with normal uint16_t, uint32_t, wchar_t types not allowing overload because they may refer to same type?

      class some_class {
      public:
          void set(std::string); // utf8 string
          void set(std::wstring); // wchar string utf16 or utf32 according
                                   // to sizeof(wchar_t)
          void set(std::basic_string<uint16_t>)
                               // wchar independent utf16 string
          void set(std::basic_string<uint32_t>);
                               // wchar independent utf32 string
      
      
      #ifdef HAVE_NEW_UNICODE_CHARRECTERS
          void set(std::basic_string<char16_t>)
                               // new standard utf16 string
          void set(std::basic_string<char32_t>);
                               // new standard utf32 string
      #endif
      };
      

      So I can just write:

      foo.set(U"Some utf32 String");
      foo.set(u"Some utf16 string");
      

    2. What are the typedef of std::basic_string<char16_t> and std::basic_string<char32_t> as there is today:

      typedef basic_string<wchar_t> wstring.
      

      I can't find any reference.

      Edit: according to headers of gcc-4.4, that introduced these new types:

      typedef basic_string<char16_t> u16string;
      typedef basic_string<char32_t> u32string;
      

      I just want to make sure that this is actual standard requirement and not gcc-ism.

    解决方案

    1) char16_t and char32_t will be distinct new types, so overloading on them will be possible.

    Quote from ISO/IEC JTC1 SC22 WG21 N2018:

    Define char16_t to be a typedef to a distinct new type, with the name _Char16_t that has the same size and representation as uint_least16_t. Likewise, define char32_t to be a typedef to a distinct new type, with the name _Char32_t that has the same size and representation as uint_least32_t.

    Further explanation (from a devx.com article "Prepare Yourself for the Unicode Revolution"):

    You're probably wondering why the _Char16_t and _Char32_t types and keywords are needed in the first place when the typedefs uint_least16_t and uint_least32_t are already available. The main problem that the new types solve is overloading. It's now possible to overload functions that take _Char16_t and _Char32_t arguments, and create specializations such as std::basic_string<_Char16_t> that are distinct from std::basic_string <wchar_t>.

    2) u16string and u32string are indeed part of C++0x and not just GCC'isms, as they are mentioned in various standard draft papers. They will be included in the new <string> header. Quote from the same article:

    The Standard Library will also provide _Char16_t and _Char32_t typedefs, in analogy to the typedefs wstring, wcout, etc., for the following standard classes:

    filebuf, streambuf, streampos, streamoff, ios, istream, ostream, fstream, ifstream, ofstream, stringstream, istringstream, ostringstream, string

    这篇关于C ++ 0x中的新unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆