为什么脚本语言不将Unicode输出到Windows控制台? [英] Why don't scripting languages output Unicode to the Windows console?

查看:90
本文介绍了为什么脚本语言不将Unicode输出到Windows控制台?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Windows控制台至少在十年之前就可以识别Unicode,甚至可以追溯到Windows NT.但是由于某种原因,包括Perl和Python在内的主要跨平台脚本语言仅输出各种8位编码,因此需要很多麻烦来解决. Perl发出正在打印宽字符"警告,Python给出一个charmap错误并退出.为什么这些年来,他们到底为什么不只是简单地调用输出UTF-16 Unicode的Win32 -W API,而不是通过ANSI/代码页瓶颈强制执行所有操作?

The Windows console has been Unicode aware for at least a decade and perhaps as far back as Windows NT. However for some reason the major cross-platform scripting languages including Perl and Python only ever output various 8-bit encodings, requiring much trouble to work around. Perl gives a "wide character in print" warning, Python gives a charmap error and quits. Why on earth after all these years do they not just simply call the Win32 -W APIs that output UTF-16 Unicode instead of forcing everything through the ANSI/codepage bottleneck?

仅仅是跨平台性能的优先级低吗?语言是否在内部使用UTF-8并觉得太麻烦而无法输出UTF-16?还是-W API本质上被破坏到无法按原样使用的程度?

Is it just that cross-platform performance is low priority? Is it that the languages use UTF-8 internally and find it too much bother to output UTF-16? Or are the -W APIs inherently broken to such a degree that they can't be used as-is?

更新

似乎所有各方都应对此负责.我以为脚本语言可以只在Windows上调用wprintf,而让OS/运行时担心诸如重定向之类的问题.但是事实证明,在Windows上,即使是wprintf也会将宽字符转换为ANSI,然后再打印回控制台 a>!

It seems that the blame may need to be shared by all parties. I imagined that the scripting languages could just call wprintf on Windows and let the OS/runtime worry about things such as redirection. But it turns out that even wprintf on Windows converts wide characters to ANSI and back before printing to the console!

由于错误报告链接似乎已断开,但我的Visual C测试代码对于wprintf仍然失败,而对于WriteConsoleW成功,因此请让我知道是否已解决.

Please let me know if this has been fixed since the bug report link seems broken but my Visual C test code still fails for wprintf and succeeds for WriteConsoleW.

更新2

实际上,您可以使用wprintf从C将UTF-16打印到控制台,但前提是您必须先进行_setmode(_fileno(stdout), _O_U16TEXT).

Actually you can print UTF-16 to the console from C using wprintf but only if you first do _setmode(_fileno(stdout), _O_U16TEXT).

从C可以将UTF-8打印到其代码页设置为代码页65001的控制台,但是Perl,Python,PHP和Ruby都有阻止此错误的错误. Perl和PHP通过在包含至少一个宽字符的行之后添加其他空行来破坏输出. Ruby的腐败输出略有不同. Python崩溃.

From C you can print UTF-8 to a console whose codepage is set to codepage 65001, however Perl, Python, PHP and Ruby all have bugs which prevent this. Perl and PHP corrupt the output by adding additional blank lines following lines which contain at least one wide character. Ruby has slightly different corrupt output. Python crashes.

更新3

Node.js是提供的第一个没有这种问题的脚本语言.

Node.js is the first scripting language that shipped without this problem straight out of the box.

Python开发人员小组逐渐意识到这是一个真正的问题,因为它是在2000年末首次被报道的. 2007 ,并且在2016年有大量的活动可以完全理解并完全修复该错误.

The Python dev team slowly came to realize this was a real problem since it was first reported back at the end of 2007 and has seen a huge flurry of activity to fully understand and fully fix the bug in 2016.

推荐答案

主要问题似乎在于,仅在使用标准C库且没有平台相关或第三方扩展的情况下,无法在Windows上使用Unicode.您提到的语言源自Unix平台,该平台的Unicode实现方法与C很好地融合在一起(它们使用常规的char*字符串,C语言环境功能和UTF-8).如果要用C语言进行Unicode,则或多或少必须编写两次所有内容:一次使用非标准的Microsoft扩展,一次使用所有其他操作系统的标准C API函数.尽管可以做到这一点,但是它通常没有很高的优先级,因为它很麻烦,而且大多数脚本语言开发人员还是讨厌或忽略Windows.

The main problem seems to be that it is not possible to use Unicode on Windows using only the standard C library and no platform-dependent or third-party extensions. The languages you mentioned originate from Unix platforms, whose method of implementing Unicode blends well with C (they use normal char* strings, the C locale functions, and UTF-8). If you want to do Unicode in C, you more or less have to write everything twice: once using nonstandard Microsoft extensions, and once using the standard C API functions for all other operating systems. While this can be done, it usually doesn't have high priority because it's cumbersome and most scripting language developers either hate or ignore Windows anyway.

从更高的技术层面来看,我认为大多数标准库设计人员所做的基本假设是,所有I/O流本质上都基于操作系统级别的字节,这对于所有操作系统上的文件以及所有流在类Unix系统上运行,唯一例外是Windows控制台.因此,如果要合并Windows控制台I/O,则必须在很大程度上修改许多类库和编程语言标准的体系结构.

At a more technical level, I think the basic assumption that most standard library designers make is that all I/O streams are inherently byte-based on the OS level, which is true for files on all operating systems, and for all streams on Unix-like systems, with the Windows console being the only exception. Thus the architecture many class libraries and programming language standard have to be modified to a great extent if one wants to incorporate Windows console I/O.

另一个更主观的观点是,微软只是不足以促进Unicode的使用.第一个具有不错的(当时)的Unicode支持的Windows操作系统是Windows NT 3.1,该版本于1993年发布,早于Linux和OS X增强了对Unicode的支持.尽管如此,在这些操作系统中向Unicode的过渡更加无缝和毫无问题.微软再次听取了销售人员的意见,而不是工程师的意见,直到2001年,技术上已经过时的Windows 9x一直保留到现在.他们并没有强迫开发人员使用干净的Unicode接口,而是提供了已损坏的,现在不需要的8位API接口,并邀请程序员使用它(请参阅有关Stack Overflow的一些近期Windows API问题,大多数新手 still 使用可怕的旧版API!).

Another more subjective point is that Microsoft just did not enough to promote the use of Unicode. The first Windows OS with decent (for its time) Unicode support was Windows NT 3.1, released in 1993, long before Linux and OS X grew Unicode support. Still, the transition to Unicode in those OSes has been much more seamless and unproblematic. Microsoft once again listened to the sales people instead of the engineers, and kept the technically obsolete Windows 9x around until 2001; instead of forcing developers to use a clean Unicode interface, they still ship the broken and now-unnecessary 8-bit API interface, and invite programmers to use it (look at a few of the recent Windows API questions on Stack Overflow, most newbies still use the horrible legacy API!).

当Unicode出现时,许多人意识到它是有用的. Unicode开始时是纯16位编码,因此使用16位代码单元是很自然的.微软然后显然说:好,我们有16位编码,所以我们必须创建一个16位API",但没有意识到没有人会使用它.但是,Unix专家认为:我们如何以一种高效且向后兼容的方式将其集成到当前系统中,以便人们可以实际使用它?"随后发明了UTF-8,这是一项出色的工程.就像创建Unix时一样,Unix人们认为更多,需要更长的时间,财务上的成功较少,但最终做对了.

When Unicode came out, many people realized it was useful. Unicode started as a pure 16-bit encoding, so it was natural to use 16-bit code units. Microsoft then apparently said "OK, we have this 16-bit encoding, so we have to create a 16-bit API", not realizing that nobody would use it. The Unix luminaries, however, thought "how can we integrate this into the current system in an efficient and backward-compatible way so that people will actually use it?" and subsequently invented UTF-8, which is a brilliant piece of engineering. Just as when Unix was created, the Unix people thought a bit more, needed a bit longer, has less financially success, but did it eventually right.

我无法对Perl进行评论(但我认为Perl社区中的Windows仇恨者多于Python社区),但是关于Python,我知道BDFL(也不太喜欢Windows)表示一个主要目标是在所有平台上提供足够的Unicode支持.

I cannot comment on Perl (but I think that there are more Windows haters in the Perl community than in the Python community), but regarding Python I know that the BDFL (who doesn't like Windows as well) has stated that adequate Unicode support on all platforms is a major goal.

这篇关于为什么脚本语言不将Unicode输出到Windows控制台?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆