Windows控制台对Unicode有哪些限制? [英] What limitations does the Windows console have regarding Unicode?

查看:125
本文介绍了Windows控制台对Unicode有哪些限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以使用 WriteConsoleW 函数。在我的Windows 7计算机上,控制台似乎不支持基本多语言平面之外的字符。另外,组合字符显示在基本字符之后,而不是实际组合。



在更高版本的Windows中是否也存在这些限制? Windows控制台中的Unicode还有其他限制吗?

解决方案

我在我对另一个问题的回答;这是进行全面披露的好地方。我的背景:我认为最有可能完全支持Windows的最广泛的控制台字体(这是一种 Unifont DejaVu 添加了。)



我从其他答案中已经提到的限制开始:


  • 每个单元格包含16位字符数据。换句话说:仅显示UCS-2代码点。 (特别是对于BMP以外的字符,将使用替代字符显示为分解为UCS-2。)


  • 仅用于简单文本呈现支持。即使使用TTF字体,控制台也不会考虑该字体的高级功能。





    ¹ p这是应该重新排列字符以进行正确的比迪烟渲染的应用程序。




字体过滤



其他限制是由于控制台进行了字体过滤。字体必须非常特殊才能被控制台接受(显示在字体选择对话框中,并且该选择起作用。)。



不记得是否可以显示字体,但无法选择字体(我对这种情况记忆不清,但是不能相信这种记忆)。




  • 字体必须标记为等距。出于对应用程序的期望,此类字体必须具有相同宽度的所有字形。



    后一种情况仅在要在控制台外使用字体时才有意义。原则上,控制台不会检查字形的宽度。但是,每个字形都显示为具有默认宽度。在许多(全部?)情况下,将仅显示默认边界框内字形的一部分。我找不到任何技巧来克服此限制。


  • 在非东亚Windows版本中,字体不能 声明 ,它支持4个东亚代码页中的任何一个。³⁾



    <3>注意,这仅是对字体标题的限制Claims -头中只有4位。字体可能包含这些语言的字形,并且在字体不要求支持的情况下,它们会显示正常。有问题的代码页(在标头的OS /2⫽字符集部分)为932、936、949、950(JIS,简体中文,韩文Wansung,繁体中文)。




字体渲染中的错误




  • 尽管Windows控制台没有支持下划线属性(除了 DBCS代码页),当字体大小变大时,字体标题的 下划线位置字段会被考虑在内屏幕上的字符bbox的值被计算。这可能会导致字体的长宽比出乎意料,并且/或者可能会导致预期连接在一起的字形之间出现中断。


  • 控制台非常挑剔关于不支持的字符的替换字形。我找不到如何使这样的字形与 U + 0000 和/或 U + 0001 。 (如果控制台在字体中找到后两个字形之一,则它将忽略替换字形。)


  • (这是一个非常晦涩的错误;它需要替换字形的另一个问题是字符 U + 30FB ・(为什么?!)。如果字体中存在此字符,则该字符的字形将用作替换字形ly,但仅适用于PUA中缺少的字符!




本质上就是这样!我没有发现其他限制。


It is possible to write Unicode characters to the Windows console using the WriteConsoleW function. On my Windows 7 machine, it looks like the console does not support characters outside the Basic Multilingual Plane. Also, combining characters are displayed after the base character, not actually combined.

Are these limitations also present in later versions of Windows? Are there other limitations on Unicode in the Windows console?

解决方案

I wrote a partial answer in my answer to a different question; here is a good place for a full disclosure. My background: I maintain what is in all probability the most extensive console font which fully supports Windows (it is a very deep rewrite of Unifont with elements of DejaVu added).

I start with the limitations already mentioned in other answers:

  • Every cell contains 16 bits of character data. In other words: only UCS-2 codepoints are shown. (In particular, for a character out of BMP, its "decomposition into UCS-2" is shown instead, using surrogate characters.)

  • only simple text rendering is supported. Even if one uses TTF fonts, no advanced "features" of the font are considered by the console. Neither advance typography (ligatures etc.), nor even composing glyphs for composing characters or right-to-left scripts¹⁾ (in LtR environment) would work as expected.

        ¹⁾ It is the application which should rearrange the characters for a correct bidi-rendering.

Font filtering

Other limitations are due to font filtering by a console. A font must be quite special to be accepted by the console (be shown in the font selection dialogue, and this selection "to work"¹⁾).

    ¹⁾ I do not recall whether a font may be shown, but won’t be selectable (I have vague memory of this happening, but cannot trust this memory).

  • The font must be marked as monospaced. Due to expectations of applications,²⁾ such fonts must have all the glyphs of the same width.

        ²⁾The latter condition is relevant only if one wants to use the font outside of console. In principle, the console does not check the widths of the glyphs. However, every glyph is shown as if it had the "default width". In many (all?) situations only the part of the glyph inside the "default bounding box" is going to be shown. I could not find any trick to circumvent this limitation.

  • On non-EastAsian releases of Windows, the font cannot claim that it supports any one of 4 East Asian codepages.³⁾

        ³⁾ Note that this is only a limitation of what the font header claims — it is just 4 bits present in the header. The font may have glyphs for these languages present, and they would show fine — as far as the font does not claim the support. The codepages in question (in the OS/2⫽Charsets section of the header) are 932, 936, 949, 950 (JIS, Simplified Chinese, Korean Wansung, Traditional Chinese).

Bugs in font rendering

  • Although Windows’ console does not support Underline attribute (except for DBCS codepages), the "Underline position" field of the font header is taken into account when the size of the on-screen character bbox is calculated. This may lead to unexpected aspect ratio of the font, and/or to interruptions between glyphs which are expected to "join together".

  • The console is very picky about the replacement glyph for "unsupported characters". I could not find how to make such a glyph to coexist with presence of glyphs for U+0000 and/or U+0001. (If the console finds one of the latter two glyphs in a font, it ignores the replacement glyph.)

  • (This is a very obscure bug; it requires a very technical discussion.) Another problem with the replacement glyph is the character U+30FB ・ (WHY?!). If this character is present in the font, the glyph for this character is used as a replacement glyph — but only for missing characters in PUA!

Essentially, this is it! I did not find any other limitation.

这篇关于Windows控制台对Unicode有哪些限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆