wchar_t* 在 MSVC 中使用 UTF8 字符 [英] wchar_t* with UTF8 chars in MSVC
问题描述
我正在尝试使用 vsnprintf
将 wchar_t*
格式化为 UTF-8 字符,然后使用 printf
打印缓冲区.
I am trying to format wchar_t*
with UTF-8 characters using vsnprintf
and then printing the buffer using printf
.
给定以下代码:
/*
This code is modified version of KB sample:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm
The usage of `setlocale` is required by my real-world scenario,
but can be modified if that fixes the issue.
*/
#include <wchar.h>
#include <stdarg.h>
#include <stdio.h>
#include <locale.h>
#ifdef MSVC
#include <windows.h>
#endif
void vout(char *string, char *fmt, ...)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
va_list arg_ptr;
va_start(arg_ptr, fmt);
vsnprintf(string, 100, fmt, arg_ptr);
va_end(arg_ptr);
}
int main(void)
{
setlocale(LC_ALL, "");
#ifdef MSVC
SetConsoleOutputCP(65001); // with or without; no dice
#endif
char string[100];
wchar_t arr[] = { 0x0119 };
vout(string, "%ls", arr);
printf("This string should have 'ę' (e with ogonek / tail) after colon: %s\n", string);
return 0;
}
我在 Ubuntu 16 上用 gcc v5.4 编译以获得所需的 BASH 输出:
I compiled with gcc v5.4 on Ubuntu 16 to get the desired output in BASH:
gcc test.c -o test_vsn
./test_vsn
This string should have 'ę' (e with ogonek / tail) after colon: ę
但是,在使用 CL v19.10.25019 (VS 2017) 的 Windows 10 上,我在 CMD 中得到奇怪的输出:
However, on Windows 10 with CL v19.10.25019 (VS 2017), I get weird output in CMD:
cl test.c /Fetest_vsn /utf-8
.\test_vsn
This string should have 'T' (e with ogonek / tail) after colon: e
(冒号前的ę
变成T
,冒号后的e
没有ogonek)
(the ę
before colon becomes T
and after the colon is e
without ogonek)
请注意,我使用了 CL 的新 /utf-8
开关(在 VS 2015 中引入),不管有没有效果,它显然都没有效果.基于他们的 博文:
Note that I used CL's new /utf-8
switch (introduced in VS 2015), which apparently has no effect with or without. Based on their blog post:
还有一个/utf-8 选项,它是设置/source-charset:utf-8"和/execution-charset:utf-8"的同义词.
There is also a /utf-8 option that is a synonym for setting "/source-charset:utf-8" and "/execution-charset:utf-8".
(我的源文件已经有 BOM/utf8'ness 并且执行字符集显然没有帮助)
(my source file already has BOM / utf8'ness and execution-charset is apparently not helping)
为了使输出看起来与 gcc 的输出相同,对代码/编译器开关的最少改动是什么?
What could be the minimal amount of changes to the code / compiler switches to make the output look identical to that of gcc?
推荐答案
根据@RemyLebeau 的评论,我修改了代码以使用 printf API 的 w
变体,以获得与 msvc on 相同的输出Windows,与 Unix 上的 gcc 相匹配.
Based on @RemyLebeau's comment, I modified the code to use w
variant of the printf APIs to get the output identical with msvc on Windows, matching that of gcc on Unix.
此外,我现在没有更改代码页,而是使用了 _setmode
(FILE
翻译模式).
Additionally, instead of changing codepage, I have now used _setmode
(FILE
translation mode).
/*
This code is modified version of KB sample:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rtref/vsnprintf.htm
The usage of `setlocale` is required by my real-world scenario,
but can be modified if that fixes the issue.
*/
#include <wchar.h>
#include <stdarg.h>
#include <stdio.h>
#include <locale.h>
#ifdef _WIN32
#include <io.h> //for _setmode
#include <fcntl.h> //for _O_U16TEXT
#endif
void vout(wchar_t *string, wchar_t *fmt, ...)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
va_list arg_ptr;
va_start(arg_ptr, fmt);
vswprintf(string, 100, fmt, arg_ptr);
va_end(arg_ptr);
}
int main(void)
{
setlocale(LC_ALL, "");
#ifdef _WIN32
int oldmode = _setmode(_fileno(stdout), _O_U16TEXT);
#endif
wchar_t string[100];
wchar_t arr[] = { 0x0119, L'\0' };
vout(string, L"%ls", arr);
wprintf(L"This string should have 'ę' (e with ogonek / tail) after colon: %ls\r\n", string);
#ifdef _WIN32
_setmode(_fileno(stdout), oldmode);
#endif
return 0;
}
或者,我们可以使用 fwprintf
并提供 stdout
作为第一个参数.要对 fwprintf(stderr,format,args)
(或 perror(format, args)
)做同样的事情,我们需要 _setmode
stderr
也是如此.
Alternatively, we can use fwprintf
and provide stdout
as first argument. To do the same with fwprintf(stderr,format,args)
(or perror(format, args)
), we would need to _setmode
the stderr
as well.
这篇关于wchar_t* 在 MSVC 中使用 UTF8 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!