在Windows控制台中正确打印utf8字符 [英] Properly print utf8 characters in windows console

查看:736
本文介绍了在Windows控制台中正确打印utf8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我尝试这样做的方式:

This is the way I try to do it:

#include <stdio.h>
#include <windows.h>
using namespace std;

int main() {
  SetConsoleOutputCP(CP_UTF8);
   //german chars won't appear
  char const* text = "aäbcdefghijklmnoöpqrsßtuüvwxyz";
  int len = MultiByteToWideChar(CP_UTF8, 0, text, -1, 0, 0);
  wchar_t *unicode_text = new wchar_t[len];
  MultiByteToWideChar(CP_UTF8, 0, text, -1, unicode_text, len);
  wprintf(L"%s", unicode_text);
}

效果是只显示我们的ascii字符。没有显示错误。源文件是用utf8编码的。

And the effect is that only us ascii chars are displayed. No errors are shown. The source file is encoded in utf8.

那么,我在这里做错了什么?

So, what I'm doing wrong here ?

到WouterH

int main() {
  SetConsoleOutputCP(CP_UTF8);
  const wchar_t *unicode_text = L"aäbcdefghijklmnoöpqrsßtuüvwxyz";
  wprintf(L"%s", unicode_text);
}




  • 这也不起作用。效果是一样的。我的字体当然是Lucida控制台。

  • 第三次:

    #include <stdio.h>
    #define _WIN32_WINNT 0x05010300
    #include <windows.h>
    #define _O_U16TEXT  0x20000
    #include <fcntl.h>
    
    using namespace std;
    
    int main() {
        _setmode(_fileno(stdout), _O_U16TEXT);
        const wchar_t *u_text = L"aäbcdefghijklmnoöpqrsßtuüvwxyz";
        wprintf(L"%s", u_text);
    }
    

    确定,某些东西开始工作,但输出是:ańbcdefghijklmno÷pqrs▀tuŘvwxyz

    ok, something begins to work, but the output is: ańbcdefghijklmno÷pqrs▀tuŘvwxyz.

    推荐答案

    默认情况下,Windows上的宽打印功能

    By default the wide print functions on Windows do not handle characters outside the ascii range.

    有几种方法可以将Unicode数据传送到Windows控制台。

    There are a few ways to get Unicode data to the Windows console.


    • 直接使用控制台API,WriteConsoleW。

    • use the console API directly, WriteConsoleW. You'll have to ensure you're actually writing to a console and use other means when the output is to something else.

    设置标准的模式,您必须确保实际写入控制台并使用其他方法输出文件描述符为Unicode模式之一,_O_U16TEXT或_O_U8TEXT。这将导致宽字符输出函数将Unicode数据正确输出到Windows控制台。如果它们用于不表示控制台的文件描述符,那么它们会导致字节的输出流分别为UTF-16和UTF-8。 N.B.在设置这些模式后,相应流上的非宽字符函数不可用,并导致崩溃。

    set the mode of the standard output file descriptors to one of the 'Unicode' modes, _O_U16TEXT or _O_U8TEXT. This causes the wide character output functions to correctly output Unicode data to the Windows console. If they're used on file descriptors that don't represent a console then they cause the output stream of bytes to be UTF-16 and UTF-8 respectively. N.B. after setting these modes the non-wide character functions on the corresponding stream are unusable and result in a crash. You must use only the wide character functions.

    通过将控制台输出代码页设置为CP_UTF8,可以将UTF-8文本直接打印到控制台正确的功能。大多数较高级别的函数如 basic_ostream< char> :: operator<<(char *)不会以这种方式工作,

    UTF-8 text can be printed directly to the console by setting the console output codepage to CP_UTF8, if you use the right functions. Most of the higher level functions such as basic_ostream<char>::operator<<(char*) don't work this way, but you can either use lower level functions or implement your own ostream that works around the problem the standard functions have.

    第三种方法的问题是这个问题:

    The problem with the third method is this:

    putc('\302'); putc('\260'); // doesn't work with CP_UTF8
    
    puts("\302\260"); // correctly writes UTF-8 data to Windows console with CP_UTF8 
    

    与大多数操作系统不同, Windows不是简单地接受字节流的另一个文件。它是由程序创建和拥有的一个特殊的设备,并通过自己独特的WIN32 API访问。问题是,当控制台写入时,API会精确地查看在使用其API时传递的数据的范围,并且发生从窄字符到宽字符的转换,而不考虑数据可能不完整。

    Unlike most operating systems, the console on Windows is not simply another file that accepts a stream of bytes. It's a special device created and owned by the program and accessed via its own unique WIN32 API. The issue is that when the console is written to, the API sees exactly the extent of the data passed in that use of its API, and the conversion from narrow characters to wide characters occurs without considering that the data may be incomplete. When a multibyte character is passed using more than one call to the console API, each separately passed piece is seen as an illegal encoding, and is treated as such.

    这应该很容易解决这个问题,但是微软的CRT团队认为它不是他们的问题,而任何团队在控制台上工作可能不在乎。

    It ought to be easy enough to work around this, but the CRT team at Microsoft views it as not their problem whereas whatever team works on the console probably doesn't care.

    您可以通过实现自己的streambuf子类来解决它,它处理正确转换到wchar_t。也就是说考虑到多字节字符的字节可以单独出现,保持写入之间的转换状态(例如, std :: mbstate_t )的事实。

    You might solve it by implementing your own streambuf subclass which handles doing the conversion to wchar_t correctly. I.e. accounting for the fact that bytes of multibyte characters may come separately, maintaining conversion state between writes (e.g., std::mbstate_t).

    这篇关于在Windows控制台中正确打印utf8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆