将std :: string编码/解码为UTF-16 [英] Encode/Decode std::string to UTF-16

查看:1137
本文介绍了将std :: string编码/解码为UTF-16的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须处理一个文件格式(从它读取和写入),其中字符串以UTF-16(每个字符2个字节)进行编码。由于ASCII表中的字符很少在应用程序域中使用,所以我的C ++模型类中的所有字符串都存储在std :: string(UTF-8编码)的实例中。



我正在寻找一个库(在STL和Boost没有运气搜索)或一组C / C ++函数来处理这个std :: string < - > UTF-16转换加载或保存到文件格式(实际上被建模为一个字节流),包括代理对的生成/识别和所有的Unicode东西(我承认没有专家)...



有什么建议么?谢谢!



编辑:忘记提及它应该是跨平台(Win / Mac),不能使用C ++ 11。


< C ++ 11具有以下功能:

  std:

:string s = u8Hello,World!;

// #include< codecvt>
std :: wstring_convert< std :: codecvt< char16_t,char,std :: mbstate_t>,char16_t>兑换;

std :: u16string u16 = convert.from_bytes(s);
std :: string u8 = convert.to_bytes(u16);

但据我所知,目前为止唯一的实现是libc ++。 C ++ 11还具有一些其它实现具有的 std :: codecvt_utf8_utf16< char16_t> 。具体来说, codecvt_utf8_utf16 在VS 2010及更高版本中工作,并且由于wchar_t由Windows用于表示UTF-16,您可以使用此在UTF-8和Windows的原生编码之间进行转换。






< >

专业化 codecvt< char16_t,char,mbstate_t> 在UTF-16和UTF-8编码
方案之间进行转换, specialization codecvt< char32_t,char,mbstate_t> 在UTF-32和
UTF-8编码方案之间进行转换。



                                                                                                                      [locale.codecvt] 22.4.1.4/3







和std :: codecvt专业化有保护的析构函数,wstring_convert需要访问析构函数,因此你真的需要一个适配器:

  template< ; class Facet> 
class useful_facet:public Facet {
public:
using Facet :: Facet; // inherit constructors
〜useful_facet(){}

//没有继承构造函数的编译器的解决方法:
// template< class ... Args> available_facet(Args& ... args):Facet(std :: forward< Args>(args)...){}
};

template< typename internT,typename externT,typename stateT>
using codecvt = useful_facet< std :: codecvt< internT,externT,stateT>> ;;

std :: wstring_convert< codecvt< char16_t,char,std :: mbstate_t>>兑换;


I have to handle a file format (both read from and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters out of the ASCII table are rarely used in the application domain, all of the strings in my C++ model classes are stored in instances of std::string (UTF-8 encoded).

I'm looking for a library (searched in STL and Boost with no luck) or a set of C/C++ functions to handle this std::string <-> UTF-16 conversion when loading from or saving to file format (actually modeled as a bytestream) including the generation/recognition of surrogate pairs and all that Unicode stuffs (I'm admittedly no expert with)...

Any suggestions? Thanks!

EDIT: forgot to mention it should be cross-platform (Win / Mac) and cannot use C++11.

解决方案

C++11 has this functionality:

std::string s = u8"Hello, World!";

// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;

std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);

However to my knowledge the only implementation that has this so far is libc++. C++11 also has std::codecvt_utf8_utf16<char16_t> which some other implementations have. Specifically, codecvt_utf8_utf16 works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16 you can use this to convert between UTF-8 and Windows' native encoding.


The specialization codecvt<char16_t, char, mbstate_t> converts between the UTF-16 and UTF-8 encoding schemes, and the specialization codecvt<char32_t, char, mbstate_t> converts between the UTF-32 and UTF-8 encoding schemes.

                                                                                                                         — [locale.codecvt] 22.4.1.4/3


Oh, and std::codecvt specializations have protected destructors, and wstring_convert requires access to the destructor so you really need an adapter:

template <class Facet>
class usable_facet : public Facet {
public:
    using Facet::Facet; // inherit constructors
    ~usable_facet() {}

    // workaround for compilers without inheriting constructors:
    // template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
};

template<typename internT, typename externT, typename stateT> 
using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;

std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;

这篇关于将std :: string编码/解码为UTF-16的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆