输出扩展 ASCII [英] Streaming Out Extended ASCII

查看:42
本文介绍了输出扩展 ASCII的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道只有正字符 ASCII 值才能保证跨平台支持.

I know that only positive character ASCII values are guaranteed cross platform support.

在 Visual Studio 2015 中,我可以:

In Visual Studio 2015, I can do:

cout << '\xBA';

然后打印:

当我在 http://ideone.com 上尝试时,我不打印任何内容.

When I try that on http://ideone.com I don't print anything.

如果我尝试使用文字字符直接打印:

If I try to directly print this using the literal character:

cout << '║';

Visual Studio 给出警告:

Visual Studio gives the warning:

警告 C4566:通用字符名称 '\u2551' 表示的字符无法在当前代码页 (1252) 中表示

warning C4566: character represented by universal-character-name '\u2551' cannot be represented in the current code page (1252)

然后打印:

?

当这个命令在 http://ideone.com 上运行时,我得到:

When this command is run on http://ideone.com I get:

14849425

我读到 wchars 可能为此提供跨平台方法.真的吗?或者我只是在扩展 ASCII 上不走运?

I've read that wchars may provide a cross platform approach to this. Is that true? Or am I simply out of luck on extended ASCII?

推荐答案

首先,您的输入源文件有自己的编码.您的编译器需要能够读取此编码(可能在标志/设置的帮助下).

First, your input source file has its own encoding. Your compiler needs to be able to read this encoding (maybe with the help of flags/settings).

对于一个简单的字符串,编译器可以自由地做它想做的事,但它必须产生一个const char[].通常,编译器会尽可能保留源编码,因此存储在程序中的字符串将具有输入文件的编码.在某些情况下,编译器会进行转换,例如,如果您的文件是 UTF-16(您不能在 chars 中放入 UTF-16 字符).

With a simple string, the compiler is free to do what it wants, but it must yield a const char[]. Usually, the compiler keeps the source encoding when it can, so the string stored in your program will have the encoding of your input file. There are cases when the compiler will do a conversion, for example if your file is UTF-16 (you can't fit UTF-16 characters in chars).

当您使用 '\xBA' 时,您编写了一个原始字符,并且您选择了自己的编码,因此编译器没有编码.

When you use '\xBA', you write a raw character, and you chose yourself your encoding, so there is no encoding from the compiler.

当你使用'║'时,'║'的类型不一定是char.如果该字符在编译器字符集中不能表示为单个字节,则其类型将为 int.对于带有 Windows-1252 源文件的 Visual Studio,'║' 不适合,因此它将是 int 类型并由 cout <<<.

When you use '║', the type of '║' is not necessarily char. If the character is not representable as a single byte in the compiler character set, its type will be int. In the case of Visual Studio with the Windows-1252 source file, '║' doesn't fit, so it will be of type int and printed as such by cout <<.

您可以在字符串文字上强制使用前缀进行编码.u8"" 将强制使用 UTF-8、u"" UTF-16 和 U"" UTF-32.请注意,L"" 前缀将为您提供一个宽字符 wchar_t 字符串,但它仍然依赖于实现.Windows 上的宽字符是 UCS-2(每个字符 2 个字节),而 linux 上是 UTF-32(每个字符 4 个字节).

You can force an encoding with prefixes on string literals. u8"" will force UTF-8, u"" UTF-16 and U"" UTF-32. Note that the L"" prefix will give you a wide char wchar_t string, but it's still implementation dependent. Wide chars on Windows are UCS-2 (2 bytes per char), but UTF-32 (4 bytes per char) on linux.

打印到控制台仅取决于变量的类型.cout << 重载了所有常见类型,因此它的作用取决于类型.cout << 通常会将 char 字符串按原样提供给控制台(实际上是 stdin),而 wcout << 通常会提供wchar_t 字符串.其他组合可能有转换或解释(如输入 int).UTF-8 字符串是 char 字符串,所以 cout << 应该总是正确地提供它们.

Printing to the console only depends on the type of the variable. cout << is overloaded with all common types, so what it does depends on the type. cout << will usually feed char strings as is to the console (actually stdin), and wcout << will usually feed wchar_t strings as is. Other combinations may have conversions or interpretations (like feeding an int). UTF-8 strings are char strings, so cout << should always feed them correctly.

接下来是控制台本身.控制台是一个完全独立的软件.你给它一些字节,它显示它们.它一点也不关心你的程序.它使用自己的编码,并尝试使用此编码打印您输入的字节.

Next, there is the console itself. A console is a totally independent piece of software. You feed it some bytes, it display them. It doesn't care one bit about your program. It uses its own encoding, and try to print the bytes you fed using this encoding.

Windows 上的默认控制台编码是代码页 850(不确定是否总是如此).在你的情况下,你的文件是 CP 1252 而你的控制台是 CP 850,这就是为什么你不能直接打印 '║'(CP 1252 不包含 '║'code>),但您可以使用原始字符.您可以使用 SetConsoleCP() 在 Windows 上更改控制台编码.

The default console encoding on Windows is Code page 850 (not sure if it is always the case). In your case, your file is CP 1252 and your console is CP 850, which is why you can't print '║' directly (CP 1252 doesn't contain '║'), but you can using a raw character. You can change the console encoding on Windows with SetConsoleCP().

在linux上,默认编码是UTF-8,更方便,因为它支持整个Unicode范围.Ideone 使用 linux,因此它将使用 UTF-8.请注意,这里增加了 HTTP 和 HTML 层,但它们也为此使用了 UTF-8.

On linux, the default encoding is UTF-8, which is more convenient because it support the whole Unicode range. Ideone uses linux, so it will use UTF-8. Note that there is the added layer of HTTP and HTML, but they also use UTF-8 for that.

这篇关于输出扩展 ASCII的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆