C++ Visual Studio 字符编码问题 [英] C++ Visual Studio character encoding issues

查看:27
本文介绍了C++ Visual Studio 字符编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法将我的头包围在这个是真正的耻辱之源......

Not being able to wrap my head around this one is a real source of shame...

我在法语 Windows (XP) 中使用法语版本的 Visual Studio (2008).发送到输出窗口的字符串中的法语口音会损坏.同样从输出窗口输入.典型的字符编码问题,我输入ANSI,得到UTF-8作为回报,或者类似的东西.什么设置可以确保在向输出窗口显示硬编码"字符串时字符保留在 ANSI 中?

I'm working with a French version of Visual Studio (2008), in a French Windows (XP). French accents put in strings sent to the output window get corrupted. Ditto input from the output window. Typical character encoding issue, I enter ANSI, get UTF-8 in return, or something to that effect. What setting can ensure that the characters remain in ANSI when showing a "hardcoded" string to the output window?

示例:

#include <iostream>

int main()
{
std:: cout << "àéêù" << std:: endl;

return 0;
}

将显示在输出中:

óúÛ¨

óúÛ¨

(这里编码为 HTML 以供您观看)

(here encoded as HTML for your viewing pleasure)

我真的很想展示:

àéêù

àéêù

推荐答案

在我继续之前,我应该指出您正在做的事情不符合 c/c++.规范在 2.2 中说明了什么字符集合在源代码中有效.它在那里并不多,并且所有使用的字符都是ascii.所以......下面的一切都是关于一个特定的实现(碰巧的是,美国语言环境机器上的 VC2008).

Before I go any further, I should mention that what you are doing is not c/c++ compliant. The specification states in 2.2 what character sets are valid in source code. It ain't much in there, and all the characters used are in ascii. So... Everything below is about a specific implementation (as it happens, VC2008 on a US locale machine).

首先,您的 cout 行中有 4 个字符,输出中有 4 个字形.所以问题不在于 UTF8 编码,因为它会将多个源字符组合成更少的字形.

To start with, you have 4 chars on your cout line, and 4 glyphs on the output. So the issue is not one of UTF8 encoding, as it would combine multiple source chars to less glyphs.

从源字符串到控制台上的显示,所有这些都起作用:

From you source string to the display on the console, all those things play a part:

  1. 您的源文件采用什么编码(即编译器如何查看您的 C++ 文件)
  2. 您的编译器如何处理字符串字面量,以及它理解的源编码
  3. 您的 << 如何解释您传入的编码字符串
  4. 控制台需要什么编码
  5. 控制台如何将该输出转换为字体字形.
  1. What encoding your source file is in (i.e. how your C++ file will be seen by the compiler)
  2. What your compiler does with a string literal, and what source encoding it understands
  3. how your << interprets the encoded string you're passing in
  4. what encoding the console expects
  5. how the console translates that output to a font glyph.

现在...

1 和 2 是相当简单的.看起来编译器会猜测源文件的格式,并将其解码为其内部表示.无论源编码是什么,它都会在当前代码页中生成字符串文字对应的数据块.我没有找到明确的细节/控制.

1 and 2 are fairly easy ones. It looks like the compiler guesses what format the source file is in, and decodes it to its internal representation. It generates the string literal corresponding data chunk in the current codepage no matter what the source encoding was. I have failed to find explicit details/control on this.

3 更容易.除了控制代码,<< 只是将数据向下传递给 char *.

3 is even easier. Except for control codes, << just passes the data down for char *.

4 由 SetConsoleOutputCP 控制.它应该默认为您的默认系统代码页.您还可以通过 GetConsoleOutputCP 找出您拥有的是哪一个(输入的控制方式不同,通过 SetConsoleCP)

4 is controlled by SetConsoleOutputCP. It should default to your default system codepage. You can also figure out which one you have with GetConsoleOutputCP (the input is controlled differently, through SetConsoleCP)

5 很有趣.我用CP1252(西欧,windows)敲了敲脑袋想弄清楚为什么我不能让é正确显示.事实证明,我的系统字体没有该字符的字形,并且有用地使用了我的标准代码页的字形(大写 Theta,如果我不调用 SetConsoleOutputCP,我会得到相同的字形).为了解决这个问题,我不得不将我在控制台上使用的字体更改为 Lucida Console(一种真正的字体).

5 is a funny one. I banged my head to figure out why I could not get the é to show up properly, using CP1252 (western european, windows). It turns out that my system font does not have the glyph for that character, and helpfully uses the glyph of my standard codepage (capital Theta, the same I would get if I did not call SetConsoleOutputCP). To fix it, I had to change the font I use on consoles to Lucida Console (a true type font).

我从中学到了一些有趣的事情:

Some interesting things I learned looking at this:

  • 源代码的编码无所谓,只要编译器能弄明白(特别是,把它改成UTF8并没有改变生成的代码.我的é"字符串仍然用CP1252编码为2330 )
  • VC 正在为我似乎无法控制的字符串文字选择代码页.
  • 控制控制台显示的内容比我预期的要痛苦

那么……这对你来说意味着什么?以下是一些建议:

So... what does this mean to you ? Here are bits of advice:

  • 不要在字符串文字中使用非 ascii.使用资源,控制编码.
  • 确保您知道您的控制台需要什么编码,并且您的字体具有代表您发送的字符的字形.
  • 如果您想弄清楚在您的情况下使用的是什么编码,我建议将字符的实际值打印为整数.char * a = "é";std::cout <<(unsigned int) (unsigned char) a[0] 确实为我显示了 233,这恰好是 CP1252 中的编码.
  • don't use non-ascii in string literals. Use resources, where you control the encoding.
  • make sure you know what encoding is expected by your console, and that your font has the glyphs to represent the chars you send.
  • if you want to figure out what encoding is being used in your case, I'd advise printing the actual value of the character as an integer. char * a = "é"; std::cout << (unsigned int) (unsigned char) a[0] does show 233 for me, which happens to be the encoding in CP1252.

顺便说一句,如果您得到的是ÓÚÛ¨"而不是您粘贴的内容,那么看起来您的 4 个字节在某处被解释为 CP850.

BTW, if what you got was "ÓÚÛ¨" rather than what you pasted, then it looks like your 4 bytes are interpreted somewhere as CP850.

这篇关于C++ Visual Studio 字符编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆