将Unicode UTF-8字符串存储在std :: string中 [英] Storing unicode UTF-8 string in std::string

查看:216
本文介绍了将Unicode UTF-8字符串存储在std :: string中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

针对

C ++中的跨平台字符串(和Unicode)

如何以跨平台友好的方式处理C/C ++中的Unicode字符串?

我正在尝试将UTF-8字符串分配给Visual Studio 2010环境中的std::string变量

I'm trying to assign a UTF-8 string to a std::string variable in Visual Studio 2010 environment

std::string msg = "महसुस";

但是,当我查看字符串视图调试器时,我只会看到"?????". 我已将文件另存为Unicode(带有签名的UTF-8) 我正在使用字符集使用unicode字符集"

However, when I view the string view debugger, I only see "?????" I have the file saved as Unicode (UTF-8 with Signature) and i'm using character set "use unicode character set"

महसुस"是一种尼泊尔语言,包含5个字符,将占用15个字节.但是Visual Studio调试器显示味精大小为5

"महसुस" is a nepali language and it contains 5 characters and will occupy 15 bytes. But visual studio debugger shows msg size as 5

我的问题是:

如何使用std :: string来存储utf-8而不需要对其进行操作?

推荐答案

如果您使用的是C ++ 11,那么这很容易:

If you were using C++11 then this would be easy:

std::string msg = u8"महसुस";

但是,由于您并非如此,因此可以使用转义序列,而不必依赖源文件的字符集为您管理编码,这样,您的代码将更具可移植性(以防您意外地将其保存为非UTF8格式) :

But since you are not, you can use escape sequences and not rely on the source file's charset to manage the encoding for you, this way your code is more portable (in case you accidentally save it in a non-UTF8 format):

std::string msg = "\xE0\xA4\xAE\xE0\xA4\xB9\xE0\xA4\xB8\xE0\xA5\x81\xE0\xA4\xB8"; // "महसुस"

否则,您可以考虑在运行时进行转换:

Otherwise, you might consider doing a conversion at runtime instead:

std::string toUtf8(const std::wstring &str)
{
    std::string ret;
    int len = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        ret.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len, NULL, NULL);
    }
    return ret;
}

std::string msg = toUtf8(L"महसुस");

这篇关于将Unicode UTF-8字符串存储在std :: string中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆