将“正常"转换为“正常". std :: string到utf-8 [英] Converting "normal" std::string to utf-8

查看:239
本文介绍了将“正常"转换为“正常". std :: string到utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们看看我是否可以在没有太多事实错误的情况下对此进行解释...

Let's see if I can explain this without too many factual errors...

我正在编写一个字符串类,我希望它使用utf-8(存储在std :: string中)作为内部存储. 我希望它能够同时使用正常" std::stringstd::wstring作为输入和输出.

I'm writing a string class and I want it to use utf-8 (stored in a std::string) as it's internal storage. I want it to be able to take both "normal" std::string and std::wstring as input and output.

使用std :: wstring没问题,我可以使用std::codecvt_utf8<wchar_t>将std :: wstring转换为std :: wstring.

Working with std::wstring is not a problem, I can use std::codecvt_utf8<wchar_t> to convert both from and to std::wstring.

但是,在进行广泛的Google搜索和搜索之后,我还没有找到在正常/默认" C ++ std :: string(我假设在Windows中使用本地系统本地化?)和utf-之间进行转换的方法. 8 std :: string.

However after extensive googling and searching on SO I have yet to find a way to convert between a "normal/default" C++ std::string (which I assume in Windows is using the local system localization?) and an utf-8 std::string.

我猜一个选择是先使用std::codecvt<wchar_t, char>将std :: string转换为std :: wstring,然后如上所述将其转换为utf-8,但这似乎效率很低,因为至少前128个如果我理解正确,则char的值应直接转换为utf-8而不进行转换,而不论本地化如何.

I guess one option would be to first convert the std::string to an std::wstring using std::codecvt<wchar_t, char> and then convert it to utf-8 as above, but this seems quite inefficient given that at least the first 128 values of a char should translate straight over to utf-8 without conversion regardless of localization if I understand correctly.

我发现了类似的问题:

I found this similar question: C++: how to convert ASCII or ANSI to UTF8 and stores in std::string Although I'm a bit skeptic towards that answer as it's hard coded to latin 1 and I want this to work with all types of localization to be on the safe side.

没有涉及提高感谢的答案,我不想让我的代码库与之一起工作.

No answers involving boost thanks, I don't want the headache of getting my codebase to work with it.

推荐答案

如果您的普通字符串"是使用系统的代码页编码的,并且您希望将其转换为UTF-8,那么这应该可以工作:

If your "normal string" is encoded using the system's code page and you want to convert it to UTF-8 then this should work:

std::string codepage_str;
int size = MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
                               codepage_str.length(), nullptr, 0);
std::wstring utf16_str(size, '\0');
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
                    codepage_str.length(), &utf16_str[0], size);

int utf8_size = WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                                    utf16_str.length(), nullptr, 0,
                                    nullptr, nullptr);
std::string utf8_str(utf8_size, '\0');
WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                    utf16_str.length(), &utf8_str[0], utf8_size,
                    nullptr, nullptr);

这篇关于将“正常"转换为“正常". std :: string到utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆