如何在 Windows 上将 UTF-8 字符串打印到 std::cout? [英] How to print UTF-8 strings to std::cout on Windows?

查看:50
本文介绍了如何在 Windows 上将 UTF-8 字符串打印到 std::cout?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 C++ 编写一个跨平台的应用程序.所有字符串在内部都是 UTF-8 编码的.考虑以下简化代码:

I'm writing a cross-platform application in C++. All strings are UTF-8-encoded internally. Consider the following simplified code:

#include <string>
#include <iostream>

int main() {
    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout << test;

    return 0;
}

在 Unix 系统上,std::cout 期望 8 位字符串是 UTF-8 编码的,所以这段代码工作正常.

On Unix systems, std::cout expects 8-bit strings to be UTF-8-encoded, so this code works fine.

然而,在 Windows 上,std::cout 期望 8 位字符串为 Latin-1 或类似的非 Unicode 格式(取决于代码页).这导致以下输出:

On Windows, however, std::cout expects 8-bit strings to be in Latin-1 or a similar non-Unicode format (depending on the codepage). This leads to the following output:

希腊语:╬▒╬▓╬│╬┤;德语:├£bergr├Â├ƒentr├ñger

Greek: ╬▒╬▓╬│╬┤; German: ├£bergr├Â├ƒentr├ñger

如何让 std::cout 在 Windows 上将 8 位字符串解释为 UTF-8?

What can I do to make std::cout interpret 8-bit strings as UTF-8 on Windows?

这是我试过的:

#include <string>
#include <iostream>
#include <io.h>
#include <fcntl.h>

int main() {
    _setmode(_fileno(stdout), _O_U8TEXT);
    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout << test;

    return 0;
}

我希望 _setmode 能够解决问题.但是,这会在调用 operator<< 的行中导致以下断言错误:

I was hoping that _setmode would do the trick. However, this results in the following assertion error in the line that calls operator<<:

Microsoft Visual C++ 运行时库

Microsoft Visual C++ Runtime Library

调试断言失败!

程序:d:visual studio 2015Projectsutf8testDebugutf8test.exe文件:minkernelcrtsucrtsrcappcrtstdiofputc.cpp行:47

Program: d:visual studio 2015Projectsutf8testDebugutf8test.exe File: minkernelcrtsucrtsrcappcrtstdiofputc.cpp Line: 47

表达式:( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))

Expression: ( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))

有关您的程序如何导致断言的信息失败,请参阅有关断言的 Visual C++ 文档.

For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts.

推荐答案

问题不是 std::cout 而是 windows 控制台.使用 C-stdio,您将在设置 UTF-8 代码页(使用 SetConsoleOutputCPchcp) 在 cmd 的设置中设置支持 Unicode 的字体(Consolas 应该 支持超过 2000 个字符,并且有注册表黑客可以向 cmd 添加更多功能强大的字体.

The problem is not std::cout but the windows console. Using C-stdio you will get the ü with fputs( "xc3xbc", stdout ); after setting the UTF-8 codepage (either using SetConsoleOutputCP or chcp) and setting a Unicode supporting font in cmd's settings (Consolas should support over 2000 characters and there are registry hacks to add more capable fonts to cmd).

如果你用 putc('xc3');putc('xbc'); 你会得到双豆腐,因为控制台将它们分别解释为非法字符.这可能就是 C++ 流所做的.

If you output one byte after the other with putc('xc3'); putc('xbc'); you will get the double tofu as the console gets them interpreted separately as illegal characters. This is probably what the C++ streams do.

请参阅 Windows 控制台上的 UTF-8 输出以进行详细讨论.

See UTF-8 output on Windows console for a lenghty discussion.

对于我自己的项目,我最终实现了一个 std::stringbuf 来转换到 Windows-1252.我确实需要完整的 Unicode 输出,但是这对您没有帮助.

For my own project, I finally implemented a std::stringbuf doing the conversion to Windows-1252. I you really need full Unicode output, this will not really help you, however.

另一种方法是覆盖cout的streambuf,使用fputs作为实际输出:

An alternative approach would be overwriting cout's streambuf, using fputs for the actual output:

#include <iostream>
#include <sstream>

#include <Windows.h>

class MBuf: public std::stringbuf {
public:
    int sync() {
        fputs( str().c_str(), stdout );
        str( "" );
        return 0;
    }
};

int main() {
    SetConsoleOutputCP( CP_UTF8 );
    setvbuf( stdout, nullptr, _IONBF, 0 );
    MBuf buf;
    std::cout.rdbuf( &buf );
    std::cout << u8"Greek: αβγδ
" << std::flush;
}

我在这里关闭了输出缓冲,以防止它干扰未完成的 UTF-8 字节序列.

I turned off output buffering here to prevent it to interfere with unfinished UTF-8 byte sequences.

这篇关于如何在 Windows 上将 UTF-8 字符串打印到 std::cout?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆