打印Unicode字符C ++ [英] printing Unicode characters C++

查看:296
本文介绍了打印Unicode字符C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个简单的命令行应用程序来教自己日语,但似乎不能获得Unicode字符打印。我缺少什么?

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?

#include <iostream>
using namespace std;

int main()
{
        wcout << L"こんにちは世界\n";
        wcout << L"Hello World\n"
        system("pause");
}

在此示例中,只显示按任意键继续。在Visual C ++ 2013测试。

In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.

推荐答案

这在Windows上不容易。即使您设法获取文本到Windows控制台,您仍然需要配置cmd.exe以显示日语字符。

This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.

#include <iostream>

int main() {
  std::cout << "こんにちは世界\n";
}

在任何系统上都可以正常工作:

This works fine on any system where:


  • 编译器的源代码和执行编码包括字符。

  • 输出设备(例如,控制台)作为编译器的执行编码。

  • 可使用带有相应字符的字体(通常不是问题)。

这些日子的大多数平台默认使用UTF-8编码所有这些编码,因此可以支持整个Unicode范围与上面的代码类似。 Windows不是这些平台之一。

Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.

wcout << L"こんにちは世界\n";

在此行中,字符串文字数据(在编译时)从源编码转换为执行宽编码,然后(在运行时) wcout 使用它被嵌入的语言环境将wchar_t数据转换为char数据输出。错误的地方在于,默认语言环境只需要支持来自基本源字符集的字符,这甚至不包括所有ASCII字符,更不用说非ASCII字符。

In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.

因此,转换会导致错误,将 wcout 置于错误状态。错误必须在wcout再次运行之前清除,这就是为什么第二个print语句不输出任何内容。

So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.

您可以通过将成功转换字符的语言环境插入 wcout 来解决有限范围的字符。不幸的是,以这种方式支持整个Unicode范围所需的编码是UTF-8;虽然微软的流实现支持其他多字节编码,但它非常具体地不支持UTF-8。

You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.

例如:

wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));

SetConsoleOutputCP(CP_UTF8);

wcout << L"こんにちは世界\n";

这里 wcout UTF-8,如果输出写入文件而不是控制台,则文件将包含正确的UTF-8数据。但是Windows控制台即使配置为接受UTF-8数据,也不会接受以此方式写入的UTF-8数据。

Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.

有几个选项:


  • 完全避免使用标准库:

  • Avoid the standard library entirely:

DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);


  • 使用会破坏标准代码的非标准魔法咒语:

  • Use non-standard magical incantation that will break standard code:

    #include <fcntl.h>
    #include <io.h>
    
    _setmode(_fileno(stdout), _O_U8TEXT);
    std::wcout << L"こんにちは世界\n";
    

    设置此模式后 std :: cout<

    After setting this mode std::cout << "Hello, World"; will crash.

    使用低级IO API以及手动转换:

    Use a low level IO API along with manual conversion:

    #include <codecvt>
    #include <locale>
    
    SetConsoleOutputCP(CP_UTF8);
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
    std::puts(convert.to_bytes(L"こんにちは世界\n"));
    


  • 使用任何一种方法,cmd.exe将显示正确的文本到最好的能力,我的意思是它将显示不可读框。七个小方框,用于给定的字符串。

    Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.

                                   

                                

    您可以将文本从cmd.exe复制到notepad.exe或其他任何位置以查看正确的字形。

    You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.

    这篇关于打印Unicode字符C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆