如何在Windows上将UTF-8字符串打印到std :: cout? [英] How to print UTF-8 strings to std::cout on Windows?
问题描述
我正在用C ++编写一个跨平台的应用程序.所有字符串在内部都是UTF-8编码的.考虑以下简化代码:
I'm writing a cross-platform application in C++. All strings are UTF-8-encoded internally. Consider the following simplified code:
#include <string>
#include <iostream>
int main() {
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
std::cout << test;
return 0;
}
在Unix系统上,std::cout
期望8位字符串采用UTF-8编码,因此此代码可以正常工作.
On Unix systems, std::cout
expects 8-bit strings to be UTF-8-encoded, so this code works fine.
但是,在Windows上,std::cout
希望8位字符串采用Latin-1或类似的非Unicode格式(取决于代码页).这将导致以下输出:
On Windows, however, std::cout
expects 8-bit strings to be in Latin-1 or a similar non-Unicode format (depending on the codepage). This leads to the following output:
希腊文:╬▒╬▓╬│╬┤;德语:├£bergr├Â├ƒentr├ñger
Greek: ╬▒╬▓╬│╬┤; German: ├£bergr├Â├ƒentr├ñger
如何使std::cout
在Windows上将8位字符串解释为UTF-8?
What can I do to make std::cout
interpret 8-bit strings as UTF-8 on Windows?
这是我尝试过的:
#include <string>
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U8TEXT);
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
std::cout << test;
return 0;
}
我希望_setmode
能解决问题.但是,这会导致在调用operator<<
的行中出现以下断言错误:
I was hoping that _setmode
would do the trick. However, this results in the following assertion error in the line that calls operator<<
:
Microsoft Visual C ++运行时库
Microsoft Visual C++ Runtime Library
调试断言失败!
程序:d:\ visual studio 2015 \ Projects \ utf8test \ Debug \ utf8test.exe 文件:minkernel \ crts \ ucrt \ src \ appcrt \ stdio \ fputc.cpp 行:47
Program: d:\visual studio 2015\Projects\utf8test\Debug\utf8test.exe File: minkernel\crts\ucrt\src\appcrt\stdio\fputc.cpp Line: 47
表达式:((_Stream.is_string_backed())||(fn = _fileno(_Stream.public_stream()),((_textmode_safe(fn)== __crt_lowio_text_mode :: ansi)&&!_tm_unicode_safe(fn)) ))
Expression: ( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))
有关程序如何引起断言的信息 失败,请参见有关断言的Visual C ++文档.
For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts.
推荐答案
问题不是std::cout
,而是Windows控制台.使用C-stdio,您可以在设置UTF-8代码页(使用SetConsoleOutputCP
或chcp
)并在cmd的设置中设置支持Unicode的字体后,将ü
与fputs( "\xc3\xbc", stdout );
一起使用. Consolas应该支持超过2000个字符,并且要添加注册表项更强大的字体到cmd).
The problem is not std::cout
but the windows console. Using C-stdio you will get the ü
with fputs( "\xc3\xbc", stdout );
after setting the UTF-8 codepage (either using SetConsoleOutputCP
or chcp
) and setting a Unicode supporting font in cmd's settings (Consolas should support over 2000 characters and there are registry hacks to add more capable fonts to cmd).
如果使用putc('\xc3'); putc('\xbc');
在另一个字节之后输出一个字节,则将得到双重豆腐,因为控制台将它们分别解释为非法字符.这可能就是C ++流所做的.
If you output one byte after the other with putc('\xc3'); putc('\xbc');
you will get the double tofu as the console gets them interpreted separately as illegal characters. This is probably what the C++ streams do.
有关冗长的讨论,请参见 Windows控制台上的UTF-8输出.
See UTF-8 output on Windows console for a lenghty discussion.
对于我自己的项目,我最终实现了std::stringbuf
并将其转换为Windows-1252.我确实需要完整的Unicode输出,但是,这对您没有真正的帮助.
For my own project, I finally implemented a std::stringbuf
doing the conversion to Windows-1252. I you really need full Unicode output, this will not really help you, however.
另一种方法是使用fputs
作为实际输出来覆盖cout
的streambuf:
An alternative approach would be overwriting cout
's streambuf, using fputs
for the actual output:
#include <iostream>
#include <sstream>
#include <Windows.h>
class MBuf: public std::stringbuf {
public:
int sync() {
fputs( str().c_str(), stdout );
str( "" );
return 0;
}
};
int main() {
SetConsoleOutputCP( CP_UTF8 );
setvbuf( stdout, nullptr, _IONBF, 0 );
MBuf buf;
std::cout.rdbuf( &buf );
std::cout << u8"Greek: αβγδ\n" << std::flush;
}
我在这里关闭了输出缓冲,以防止它干扰未完成的UTF-8字节序列.
I turned off output buffering here to prevent it to interfere with unfinished UTF-8 byte sequences.
这篇关于如何在Windows上将UTF-8字符串打印到std :: cout?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!