C ++&放大器;升压:EN code /德code UTF-8 [英] C++ & Boost: encode/decode UTF-8

查看:154
本文介绍了C ++&放大器;升压:EN code /德code UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图做一个非常简单的任务:采取单向code感知 wstring的,并将其转换为字符串,连接$ C $光盘作为UTF8字节,然后围绕相反的方向:取字符串包含UTF8字节,将其转换为单向code-感知 wstring的

I'm trying to do a very simple task: take a unicode-aware wstring and convert it to a string, encoded as UTF8 bytes, and then the opposite way around: take a string containing UTF8 bytes and convert it to unicode-aware wstring.

问题是,我需要它的跨平台,我需要它与升压工作...我只是似乎无法找出一种方法,使其工作。我一直在玩弄

The problem is, I need it cross-platform and I need it work with Boost... and I just can't seem to figure a way to make it work. I've been toying with

  • http://www.edobashira.com/2010/03/using-boost-code-facet-for-reading-utf8.html and
  • http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/codecvt.html

尝试转换的code使用字符串流 / wstringstream 而不是任何的文件,但没有似乎工作。

Trying to convert the code to use stringstream/wstringstream instead of files of whatever, but nothing seems to work.

例如,在Python它看起来像这样:

For instance, in Python it would look like so:

>>> u"שלום"
u'\u05e9\u05dc\u05d5\u05dd'
>>> u"שלום".encode("utf8")
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'.decode("utf8")
u'\u05e9\u05dc\u05d5\u05dd'

什么我之后,最终是这样的:

What I'm ultimately after is this:

wchar_t uchars[] = {0x5e9, 0x5dc, 0x5d5, 0x5dd, 0};
wstring ws(uchars);
string s = encode_utf8(ws); 
// s now holds "\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d"
wstring ws2 = decode_utf8(s);
// ws2 now holds {0x5e9, 0x5dc, 0x5d5, 0x5dd}

我真的不希望添加在这种精神的ICU或一些其他的依赖......但我的理解,应该是可能的推动作用。

I really don't want to add another dependency on the ICU or something in that spirit... but to my understanding, it should be possible with Boost.

一些示例code将大大AP preciated!谢谢

Some sample code would greatly be appreciated! Thanks

推荐答案

谢谢大家,但最终我使出 HTTP://utfcpp.sourceforge。净/ - 这是一个只有头库这是非常轻便,易于使用。我在这里分享一个演示code,如果有人发现它有用:​​

Thanks everyone, but ultimately I resorted to http://utfcpp.sourceforge.net/ -- it's a header-only library that's very lightweight and easy to use. I'm sharing a demo code here, should anyone find it useful:

inline void decode_utf8(const std::string& bytes, std::wstring& wstr)
{
    utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr));
}
inline void encode_utf8(const std::wstring& wstr, std::string& bytes)
{
    utf8::utf32to8(wstr.begin(), wstr.end(), std::back_inserter(bytes));
}

用法:

wstring ws(L"\u05e9\u05dc\u05d5\u05dd");
string s;
encode_utf8(ws, s);

这篇关于C ++&放大器;升压:EN code /德code UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆