c ++,cout和UTF-8 [英] c++, cout and UTF-8

查看:128
本文介绍了c ++,cout和UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有一个简单的问题:cout在处理以多字节UTF-8字符结尾的字符串时似乎死了,我在做错什么吗?这是Win7 x64上的GCC(Mingw).

Hopefully a simple question: cout seems to die when handling strings that end with a multibyte UTF-8 char, am I doing something wrong? This is with GCC (Mingw) on Win7 x64.

**编辑对不起,如果我不够清楚,我不担心丢失的字形或字节如何解释,仅是在调用cout << s4之后它们根本没有显示(缺少BAR) ).第一次显示后的任何cout均不显示文本!

**Edit Sorry if I wasn't clear enough, I'm not concerned about the missing glyphs or how the bytes are interpreted, merely that they are not showing at all right after the call to cout << s4 (missing BAR). Any further couts after the first display no text whatsoever!

#include <cstdio>
#include <iostream>
#include <string>

int main() {
    std::string s1("abc");
    std::string s2("…");  // … = 0xE2 80 A6
    std::string s3("…abc");
    std::string s4("abc…");

    //In C
    fwrite(s1.c_str(), s1.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s2.c_str(), s2.size(), 1, stdout);
    printf(" BAR ");
    fwrite(s3.c_str(), s3.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s4.c_str(), s4.size(), 1, stdout);
    printf(" BAR\n\n"); 

    //C++
    std::cout << s1 << " FOO " << s2 << " BAR " << s3 << " FOO " << s4 << " BAR ";
}

// results:

// abc FOO ��� BAR ���abc FOO abc… BAR

// abc FOO ��� BAR ���abc FOO abc…

推荐答案

这真的不足为奇.除非您的终端设置为UTF-8编码,否则如何知道s2不应为(带抑扬符的拉丁小写字母a)(欧元符号)(管道)", 假设您的终端已根据 http://www.ascii-code.com/设置为ISO-8859-1

This is really no surprise. Unless your terminal is set to UTF-8 coding, how does it know that s2 isn't supposed to be "(Latin small letter a with circumflex)(Euro sign)(Pipe)", supposing that your terminal is set to ISO-8859-1 according to http://www.ascii-code.com/

顺便说一句,cout不会死",因为它显然会在测试字符串之后继续产生输出.

By the way, cout is not "dying" as it clearly continues to produce output after your test string.

这篇关于c ++,cout和UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆