显示扩展ASCII字符 [英] Displaying extended ASCII characters

查看:800
本文介绍了显示扩展ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在32位Windows的Visual Studio 2005中,为什么我的控制台不显示128到255之间的字符?



例如:

  cout<< ¿<< endl; //转换问号

输出

 
按任意键继续。 。 。


解决方案

Windows 控制台窗口是纯Unicode。其缓冲区将文本存储为UCS-2 Unicode(每个字符16位,本质上类似于原始Unicode,对现代21位Unicode的基本多语言平面的限制)。因此,控制台窗口可以呈现几乎所有类型的文本。



但是,对于每个字符的单字节(可能还有一些变长编码)i / o Windows自动翻译到/从控制台窗口的活动代码页。如果控制台窗口是一个[cmd.exe]实例,那么您可以通过 chcp 命令检查, change codepage 。像这样:

 
C:\test> chcp
活动代码页:850

C:\test> _

Codepage 850是基于原始IBM PC英语代码页437的编码。850至少是挪威PC的控制台窗口虽然聪明的挪威人可能改变到865)。



原始的IBM PC代码页(字符编码)称为 OEM ,这是一个无意义的首字母缩写,原始设备制造商。它有漂亮的线条图字符适合原来的PC的文本模式屏幕。更一般来说,OEM代表控制台窗口的默认代码页,其中codepage 437只是原始代码页。当Microsoft创建了16位Windows时,他们选择另一种在Windows中已知的编码,如 > ANSI 。原始版本是 ISO Latin-1 的扩展名,它在互联网上是默认的(但是,目前还不清楚哪个版本最先出现:Microsoft参与了标准化)。这个原始的ANSI现在称为 Windows ANSI西部



ANSI是几乎所有其他Windows使用的代码页。控制台窗口使用OEM。记事本,其他编辑器等使用ANSI。



然后,当Microsoft做出Windows 32位时,他们采用了一个16位扩展的拉丁语-1 Unicode 。 Microsoft是Unicode Consortium的原创创始成员。并且基本的API,包括控制台窗口,文件系统等,被重写为使用Unicode。为了向后兼容,有一个翻译层,在OEM和Unicode之间转换控制台窗口,在ANSI和Unicode之间转换其他功能。例如, MessageBoxA 是基于Unicode的 MessageBoxW 的ANSI包装器。



其实际结果是,在Windows中,C ++源代码通常使用ANSI编码,而控制台窗口则采用OEM。其中make

  cout< 我喜欢挪威语blåbærsyltetøy! << endl; 

产生纯gobbledegook…您可以使用基于Unicode的控制台窗口API将Unicode直接输出到控制台窗口,避免翻译,但这很尴尬。



请注意,使用 wcout 而不是 cout 不起作用:by design wcout 宽字符串到程序的窄字符集,丢弃信息。很难相信,C ++标准库提供了一个相当大的非常复杂的功能块,这是无意义的(因为这些转换可能只是由 cout )。但是,它是,只是无意义。可能是一些政治上的妥协,但无论如何, wcout 不会帮助,即使如果它是有意义的,那么它应该在逻辑上帮助这个。



那么挪威新手程序员如何获得blåbærsyltetøy?



只需将活动代码页更改为ANSI即可。因为在大多数西方国家的PC上ANSI是代码页1252,你可以通过

 
C:\test> chcp 1252
活动代码页:1252

C:\test> _

现在,旧的DOS程序[edit.com](仍然存在于Windows XP!)会产生一些gobbledegook,因为原始的PC字符集线条图字符不存在在ANSI,因为国家字符在ANSI中有不同的代码。但嘿,谁使用旧的DOS程序?不是我!



如果您希望这是一个更永久的代码页,您必须通过未公开的注册表项更改控制台窗口的配置:


HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage


在此键中,将 OEMCP 的值更改为1252,然后重新启动



chcp 一样,或者其他将代码页更改为1252,使旧DOS程序存在gobbledegook,但使C ++程序或其他现代控制台程序工作正常。



因为您在控制台窗口中的字符编码与Windows的其余部分相同。


In Visual Studio 2005 on 32-bit Windows, why doesn't my console display characters from 128 to 255?

for example:

cout << "¿" << endl;  //inverted question mark

Output:

┐
Press any key to continue . . .

解决方案

A Windows console window is pure Unicode. Its buffer stores text as UCS-2 Unicode (16 bits per character, essentially like original Unicode, a restriction to the Basic Multilingual Plane of modern 21-bit Unicode). So a console window can present almost all kinds of text.

However, for single byte per character (and possibly also for some variable length encodings) i/o Windows automatically translates to/from the console window's active codepage. If the console window is a [cmd.exe] instance then you can inspect that via command chcp, short for change codepage. Like this:

C:\test> chcp
Active code page: 850

C:\test> _

Codepage 850 is an encoding based on the original IBM PC English codepage 437. 850 is default for console windows on at least Norwegian PC's (although savvy Norwegians may change that to 865). None of those are codepages that you should use, however.

The original IBM PC codepage (character encoding) is known as OEM, which is a meaningless acronym, Original Equipment Manufacturer. It had nice line drawing characters suitable for the original PC's text mode screen. More generally OEM means the default code page for console windows, where codepage 437 is just the original one: it can be configured, e.g. per window via chcp.

When Microsoft created 16-bit Windows they chose another encoding known in Windows as ANSI. The original one was an extension of ISO Latin-1 which for a long while was the default on the Internet (however, it's unclear which came first: Microsoft participated in the standardization). This original ANSI is now known as Windows ANSI Western.

ANSI is the code page used for non-Unicode by almost all the rest of Windows. Console windows use OEM. Notepad, other editors, and so on, use ANSI.

Then, when Microsoft made Windows 32-bit, they adopted a 16-bit extension of Latin-1 known as Unicode. Microsoft was an original founding member of the Unicode Consortium. And the basic API, including console windows, the file system, etc., was rewritten to use Unicode. For backward compatibility there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functionality. For example, MessageBoxA is an ANSI wrapper for Unicode-based MessageBoxW.

The practical upshot of that is that in Windows your C++ source code is typically encoded with ANSI, while console windows assume OEM. Which e.g. makes

cout << "I like Norwegian blåbærsyltetøy!" << endl;

produce pure gobbledegook… You can use the Unicode-based console window APIs to output Unicode directly to a console window, avoiding the translation, but that's awkward.

Note that using wcout instead of cout doesn't help: by design wcout just translates down from wide character strings to the program's narrow character set, discarding information on the way. It can be hard to believe, that the C++ standard library offers a rather big chunk of very very complex functionality that is meaningless (since instead those conversions could just have been supported by cout). But so it is, just meaningless. Possibly it was some political-like compromise, but anyway, wcout does not help, even though if it were meaningful in some way then it "should" logically help with this.

So how does a Norwegian novice programmer get e.g. "blåbærsyltetøy" presented?

Well, simply by changing the active code page to ANSI. Since on most Western country PCs ANSI is codepage 1252, you can do that for a given command interpreter instance by

C:\test> chcp 1252
Active code page: 1252

C:\test> _

Now old DOS programs like e.g. [edit.com] (still present in Windows XP!) will produce some gobbledegook, because the original PC character set line drawing characters are not there in ANSI, and because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!

If you want this as a more permanent code page, you'll have to change the configuration of console windows via an undocumented registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

In this key, change value of OEMCP to 1252, and reboot.

As with chcp, or other change of codepage to 1252, makes old DOS programs present gobbledegook, but makes C++ programs or other modern console programs work OK.

Since you then have same character encoding in console windows as in the rest of Windows.

这篇关于显示扩展ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆