如何在Windows上将UTF-8字符串打印到std :: cout? [英] How to print UTF-8 strings to std::cout on Windows?

查看:683
本文介绍了如何在Windows上将UTF-8字符串打印到std :: cout?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用C ++编写一个跨平台的应用程序.所有字符串在内部都是UTF-8编码的.考虑以下简化代码:

I'm writing a cross-platform application in C++. All strings are UTF-8-encoded internally. Consider the following simplified code:

#include <string>
#include <iostream>

int main() {
    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout << test;

    return 0;
}

在Unix系统上,std::cout期望8位字符串采用UTF-8编码,因此此代码可以正常工作.

On Unix systems, std::cout expects 8-bit strings to be UTF-8-encoded, so this code works fine.

但是,在Windows上,std::cout希望8位字符串采用Latin-1或类似的非Unicode格式(取决于代码页).这将导致以下输出:

On Windows, however, std::cout expects 8-bit strings to be in Latin-1 or a similar non-Unicode format (depending on the codepage). This leads to the following output:

希腊文:╬▒╬▓╬│╬┤;德语:├£bergr├Â├ƒentr├ñger

Greek: ╬▒╬▓╬│╬┤; German: ├£bergr├Â├ƒentr├ñger

如何使std::cout在Windows上将8位字符串解释为UTF-8?

What can I do to make std::cout interpret 8-bit strings as UTF-8 on Windows?

这是我尝试过的:

#include <string>
#include <iostream>
#include <io.h>
#include <fcntl.h>

int main() {
    _setmode(_fileno(stdout), _O_U8TEXT);
    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout << test;

    return 0;
}

我希望_setmode能解决问题.但是,这会导致在调用operator<<的行中出现以下断言错误:

I was hoping that _setmode would do the trick. However, this results in the following assertion error in the line that calls operator<<:

Microsoft Visual C ++运行时库

Microsoft Visual C++ Runtime Library

调试断言失败!

程序:d:\ visual studio 2015 \ Projects \ utf8test \ Debug \ utf8test.exe 文件:minkernel \ crts \ ucrt \ src \ appcrt \ stdio \ fputc.cpp 行:47

Program: d:\visual studio 2015\Projects\utf8test\Debug\utf8test.exe File: minkernel\crts\ucrt\src\appcrt\stdio\fputc.cpp Line: 47

表达式:((_Stream.is_string_backed())||(fn = _fileno(_Stream.public_stream()),((_textmode_safe(fn)== __crt_lowio_text_mode :: ansi)&&!_tm_unicode_safe(fn)) ))

Expression: ( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))

有关程序如何引起断言的信息 失败,请参见有关断言的Visual C ++文档.

For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts.

推荐答案

问题不是std::cout,而是Windows控制台.使用C-stdio,您可以在设置UTF-8代码页(使用SetConsoleOutputCPchcp)在cmd的设置中设置支持Unicode的字体后,将üfputs( "\xc3\xbc", stdout );一起使用. Consolas应该支持超过2000个字符,并且要添加注册表项更强大的字体到cmd).

The problem is not std::cout but the windows console. Using C-stdio you will get the ü with fputs( "\xc3\xbc", stdout ); after setting the UTF-8 codepage (either using SetConsoleOutputCP or chcp) and setting a Unicode supporting font in cmd's settings (Consolas should support over 2000 characters and there are registry hacks to add more capable fonts to cmd).

如果使用putc('\xc3'); putc('\xbc');在另一个字节之后输出一个字节,则将得到双重豆腐,因为控制台将它们分别解释为非法字符.这可能就是C ++流所做的.

If you output one byte after the other with putc('\xc3'); putc('\xbc'); you will get the double tofu as the console gets them interpreted separately as illegal characters. This is probably what the C++ streams do.

有关冗长的讨论,请参见 Windows控制台上的UTF-8输出.

See UTF-8 output on Windows console for a lenghty discussion.

对于我自己的项目,我最终实现了std::stringbuf并将其转换为Windows-1252.我确实需要完整的Unicode输出,但是,这对您没有真正的帮助.

For my own project, I finally implemented a std::stringbuf doing the conversion to Windows-1252. I you really need full Unicode output, this will not really help you, however.

另一种方法是使用fputs作为实际输出来覆盖cout的streambuf:

An alternative approach would be overwriting cout's streambuf, using fputs for the actual output:

#include <iostream>
#include <sstream>

#include <Windows.h>

class MBuf: public std::stringbuf {
public:
    int sync() {
        fputs( str().c_str(), stdout );
        str( "" );
        return 0;
    }
};

int main() {
    SetConsoleOutputCP( CP_UTF8 );
    setvbuf( stdout, nullptr, _IONBF, 0 );
    MBuf buf;
    std::cout.rdbuf( &buf );
    std::cout << u8"Greek: αβγδ\n" << std::flush;
}

我在这里关闭了输出缓冲,以防止它干扰未完成的UTF-8字节序列.

I turned off output buffering here to prevent it to interfere with unfinished UTF-8 byte sequences.

这篇关于如何在Windows上将UTF-8字符串打印到std :: cout?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆