如何将 UTF8 字符数组转换为 Windows 1252 字符数组 [英] How to convert UTF8 char array to Windows 1252 char array
问题描述
我是 C++ 的菜鸟,所以我很抱歉提出愚蠢的问题.
I am noob in C++ so I am very sorry for asking stupid question.
我有一段文字:Павло
I have a piece of text: Павло
我从我正在处理的一段代码的控制台输出中得到它.我知道这是隐藏在它后面的西里尔字.它的真正价值是Петро".
I get it somewhere from console output in piece of code I am working on. I know that this is cyrillic word hidded behind it. It's real value is "Петро".
使用在线编码检测器,我发现要正确阅读此文本,我必须将其从 UTF-8 转换为 Windows 1252.
With online encoding detector I have found that to read this text properly, I have to convert it from UTF-8 to Windows 1252.
我怎样才能用代码做到这一点?
How can I do it with code?
我已经试过了,它给出了一些结果,但它输出了 5 个问号(至少是预期的)
I have tried this, it gives some results, but it outputs 5 questionmarks (at least lenght expected)
wchar_t *CodePageToUnicode(int codePage, const char *src)
{
if (!src) return 0;
int srcLen = strlen(src);
if (!srcLen)
{
wchar_t *w = new wchar_t[1];
w[0] = 0;
return w;
}
int requiredSize = MultiByteToWideChar(codePage,
0,
src, srcLen, 0, 0);
if (!requiredSize)
{
return 0;
}
wchar_t *w = new wchar_t[requiredSize + 1];
w[requiredSize] = 0;
int retval = MultiByteToWideChar(codePage,
0,
src, srcLen, w, requiredSize);
if (!retval)
{
delete[] w;
return 0;
}
return w;
}
char *UnicodeToCodePage(int codePage, const wchar_t *src)
{
if (!src) return 0;
int srcLen = wcslen(src);
if (!srcLen)
{
char *x = new char[1];
x[0] = '\0';
return x;
}
int requiredSize = WideCharToMultiByte(codePage,
0,
src, srcLen, 0, 0, 0, 0);
if (!requiredSize)
{
return 0;
}
char *x = new char[requiredSize + 1];
x[requiredSize] = 0;
int retval = WideCharToMultiByte(codePage,
0,
src, srcLen, x, requiredSize, 0, 0);
if (!retval)
{
delete[] x;
return 0;
}
return x;
}
int main()
{
const char *text = "Павло";
// Now convert utf-8 back to ANSI:
wchar_t *wText2 = CodePageToUnicode(65001, text);
char *ansiText = UnicodeToCodePage(1252, wText2);
cout << ansiText;
_getch();
}
也试过这个,但它不起作用
also tried this, but it's not working propery
int main()
{
const char *orig = "Павло";
size_t origsize = strlen(orig) + 1;
const size_t newsize = 100;
size_t convertedChars = 0;
wchar_t wcstring[newsize];
mbstowcs_s(&convertedChars, wcstring, origsize, orig, _TRUNCATE);
wcscat_s(wcstring, L" (wchar_t *)");
std::wstring strUTF(wcstring);
const wchar_t* szWCHAR = strUTF.c_str();
cout << szWCHAR << '\n';
char *buffer = new char[origsize / 2 + 1];
WideCharToMultiByte(CP_ACP, 0, szWCHAR, -1, buffer, 256, NULL, NULL);
cout << buffer;
_getch();
}
推荐答案
有几个选项
使用 Windows API
Using Windows API
使用 MultiByteToWideChar
将您的 UTF-8
转换为系统 UTF-16LE
,然后从 UTF-16LE
转换为CP1251
(Cyrillic 是 1251 而不是 1252) WideCharToMultiByte
Convert your UTF-8
to system UTF-16LE
using MultiByteToWideChar
and then from UTF-16LE
to CP1251
(Cyrillic is 1251 not 1252) over WideCharToMultiByte
使用 MS MLAGNAPI
使用 IBM ICU
如果您只需要将您的 UNICODE 输出到控制台,请检查这个一个>
If you simply need to output your UNICODE into console, check this
这篇关于如何将 UTF8 字符数组转换为 Windows 1252 字符数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!