Windows上的std :: cout可以与UTF-8一起使用吗? [英] Can std::cout work with UTF-8 on Windows?

查看:177
本文介绍了Windows上的std :: cout可以与UTF-8一起使用吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想让std::cout打印一个UTF-8文字.对于gcc来说,这似乎是一项容易的任务,而对于Windows,这是一项极其困难的任务.

I want to make std::cout print an UTF-8 literal. This seems to be an easy task with gcc, but an extremely difficult one with Windows.

我要开始使用的代码是:

The code that I'm trying to get to work is:

std::cout << "Ελληνικά Русский 你好";

环境:

  • Windows 10,Visual Studio 2015
  • 默认编码:1251
  • 控制台编码:866
  • 源编码:具有BOM的UTF-8

要求:

  • 无需更改代码行本身
  • 全面的Unicode范围支持
  • 某些设置代码可能会添加到main()
  • 的开头
  • No changes to the line of code itself must be made
  • Full Unicode range support
  • Some setup code may be added in the beginning of main()

我尝试过的事情:

  • #pragma execution_character_set("utf-8")
  • SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8);
  • 在系统范围内将控制台字体设置为Lucida Console
  • 项目属性中的
  • Use Unicode character set
  • 此代码博客
  • #pragma execution_character_set("utf-8")
  • SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8);
  • Set console font to Lucida Console system-wide
  • Use Unicode character set in project properties
  • Setup code from this blog

没有任何帮助,没有StackOverflow答案可以解决问题.

Nothing helped, and no StackOverflow answer solved the problem.

修改

要部分(部分)使用Unicode,请执行以下操作:

To get Unicode partially working, do the following:

  • 首先从下面的列表中呼叫initStreams()
  • 在项目设置中打开Use Unicode Character Set
  • 添加/utf-8选项
  • Call initStreams() from the listing below at the start
  • Turn on Use Unicode Character Set in Project Settings
  • Add /utf-8 option

不起作用:

  • wprintf
  • cin/wcin
  • 汉字
  • wprintf
  • cin/wcin
  • Chinese characters

initStreams()实施:

#include <cassert>         // assert
#include <codecvt>          // std::codecvt_utf8 (C++11)
#include <stdexcept>        // std::exception
#include <streambuf>        // std::basic_streambuf
#include <iostream>         // std::cout, std::endl
#include <locale>           // std::locale
#include <memory>           // std::unique_ptr (C++11)

#undef  UNICODE
#define UNICODE
#undef  STRICT
#define STRING
#include <windows.h>    // MultiByteToWideChar

class OutputForwarderBuffer : public std::basic_streambuf<char>
{
public:
    using Base = std::basic_streambuf<char>;
    using Traits = Base::traits_type;
    using StreamBuffer = std::basic_streambuf<char>;
    using WideStreamBuffer = std::basic_streambuf<wchar_t>;
    using Base::int_type;
    using Base::char_type;

    OutputForwarderBuffer(
        StreamBuffer& existingBuffer,
        WideStreamBuffer* pWideStreamBuffer
    )
        : Base(existingBuffer)
        , pWideStreamBuffer_(pWideStreamBuffer)
    {
    }

    OutputForwarderBuffer(OutputForwarderBuffer const&) = delete;
    void operator=(OutputForwarderBuffer const&) = delete;

protected:
    std::streamsize xsputn(char const* s, std::streamsize n) override
    {
        if (n == 0) { return 0; }

        int const sourceSize = static_cast<int>(n);
        int const destinationSize = MultiByteToWideChar(CP_UTF8, 0, s, sourceSize, nullptr, 0);
        wideCharBuffer_.resize(static_cast<size_t>(sourceSize));

        int const nWideCharacters = MultiByteToWideChar(CP_UTF8, 0, s, sourceSize, &wideCharBuffer_[0], destinationSize);
        assert(nWideCharacters > 0 && nWideCharacters == destinationSize);

        return pWideStreamBuffer_->sputn(&wideCharBuffer_[0], destinationSize);
    }

    int_type overflow(int_type c) override
    {
        bool const cIsEOF = Traits::eq_int_type(c, Traits::eof());
        int_type const failureValue = Traits::eof();
        int_type const successValue = (cIsEOF ? Traits::not_eof(c) : c);

        if (!cIsEOF) {
            char_type const ch = Traits::to_char_type(c);
            std::streamsize const nCharactersWritten = xsputn(&ch, 1);

            return (nCharactersWritten == 1 ? successValue : failureValue);
        }
        return successValue;
    }

private:
    WideStreamBuffer* pWideStreamBuffer_;
    std::wstring wideCharBuffer_;
};

void setUtf8Conversion(std::basic_ios<wchar_t>& stream)
{
    stream.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8_utf16<wchar_t>()));
}

bool isConsole(HANDLE streamHandle)
{
    DWORD consoleMode;
    return !!GetConsoleMode(streamHandle, &consoleMode);
}

bool isConsole(DWORD stdStreamId)
{
    return isConsole(GetStdHandle(stdStreamId));
}

void initStreams()
{
    SetConsoleCP(CP_UTF8);
    SetConsoleOutputCP(CP_UTF8);

    setUtf8Conversion(std::wcout);
    setUtf8Conversion(std::wcerr);
    setUtf8Conversion(std::wclog);

    static OutputForwarderBuffer coutBuffer(*std::cout.rdbuf(), std::wcout.rdbuf());
    static OutputForwarderBuffer cerrBuffer(*std::cerr.rdbuf(), std::wcerr.rdbuf());
    static OutputForwarderBuffer clogBuffer(*std::clog.rdbuf(), std::wclog.rdbuf());

    std::cout.rdbuf(&coutBuffer);
    std::cerr.rdbuf(&cerrBuffer);
    std::clog.rdbuf(&clogBuffer);
}

推荐答案

这就是我要做的事情:

  1. 确保您的源文件是utf-8编码的并且具有正确的内容(在另一个编辑器中打开它们,检查字形和文件编码)

  1. make sure your source files are utf-8 encoded and have correct content (open them in another editor, check glyphs and file encoding)

从等式中删除控制台-将输出重定向到文件,并使用支持utf-8的编辑器检查其内容(就像使用源代码一样)

remove console from equation -- redirect output to a file and check it's content with utf-8-aware editor (just like with source code)

在MSVC2015 +中使用/utf-8 cmdline选项-这将强制编译器将所有源文件视为utf-8编码一次,并且存储在生成的二进制文件中的字符串文字将被utf-8编码.

use /utf-8 cmdline option with MSVC2015+ -- this will force compiler to treat all source files as utf-8 encoded once and your string literals stored in resulting binary will be utf-8 encoded.

从等式中删除 iostreams (等到这个库死了,等等)-使用 cstdio

remove iostreams from equation (can't wait until for this library to die, tbh) -- use cstdio

这时输出应该可以工作(对我来说确实如此)

at this point output should work (it does for me)

使控制台输出正常工作-使用SetConsoleOutputCP(CP_UTF8)并使其使用支持您 Unicode平面的TrueType字体(我怀疑中文字符可以在控制台中工作需要在系统中安装支持相关 Unicode平面的字体,并且应将控制台配置为使用该字体)

to get console output to work -- use SetConsoleOutputCP(CP_UTF8) and get it to use TrueType font that supports your Unicode plane (I suspect that for chinese characters to work in console you need a font installed in your system that supports related Unicode plane and your console should be configured to use it)

不确定控制台输入(不必处理),但我怀疑SetConsoleCP(CP_UTF8)应该使其与非宽范围I/o一起使用

not sure about console input (never had to deal with that), but I suspect that SetConsoleCP(CP_UTF8) should make it work with non-wide i/o

放弃使用宽I/O(wcout/etc)的想法-无论如何,您为什么要这样做? Unicode可以与utf-8编码的 char const *

discard the idea of using wide i/o (wcout/etc) -- why would you do it anyway? Unicode works just fine with utf-8 encoded char const*

一旦您到达此阶段-是时候来处理 iostreams 了(如果您坚持要使用它的话).我暂时不理会wcin/wcout.如果尚未使用,请尝试使用utf-8语言环境注入相关的cin/cout.

once you reached this stage -- time to deal with iostreams (if you insist on using it). I'd disregard wcin/wcout for now. If they don't already work -- try imbue'ing related cin/cout with utf-8 locale.

http://utf8everywhere.org/提出的想法是将其转换为UCS-2仅当您调用Windows API时.这使您的 OutputForwarderBuffer 不必要.

the idea promoted by http://utf8everywhere.org/ is to convert to UCS-2 only when you make Windows API call. This makes your OutputForwarderBuffer unnecessary.

我猜(如果您真的坚持),现在您可以尝试使宽范围的iostream起作用.祝您好运,我想您将不得不重新配置控制台(这会破坏非宽域I/O),或者以某种方式让您的wcout/wcin即时执行UCS2-to-UTF8转换(并且仅在将其连接到控制台的情况下) .

I guess (if you REALLY insist) now you can try getting wide iostreams to work. Good luck, I guess you'll have to reconfigure console (which will break non-wide i/o) or somehow get your wcout/wcin performing UCS2-to-UTF8 conversion on the fly (and only if it is connected to console).

修改: 从Windows 10开始,您还需要以下内容:

Starting from Windows 10 you also need this:

setvbuf(stderr, NULL, _IOFBF, 1024);    // on Windows 10+ we need buffering or console will get 1 byte at a time (screwing up utf-8 encoding)
setvbuf(stdout, NULL, _IOFBF, 1024);

不幸的是,这也意味着如果在下一次刷新之前完全填充缓冲区,仍然有可能搞砸您的输出.正确的解决方案-在每个字符串发送到输出后(假定每个字符串小于1024),手动冲洗(endlfflush()).如果仅MS支持行缓冲...

Unfortunately this also means that there is still a chance of screwing up your output if you fill buffer completely before next flush. Proper solution -- flush it manually (endl or fflush()) after every string sent to output (assuming each string is less than 1024). If only MS supported line-buffering...

这篇关于Windows上的std :: cout可以与UTF-8一起使用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆