有多少非打印字符常用? [英] How many non-printing characters are in common use?

查看:171
本文介绍了有多少非打印字符常用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在为PDF,HTML和其他文档编写解释器时,我们需要处理各种空格字符和其他非打印字符。 ANSI的定义很明确,但在实践中可能会找到多少其他的?一个典型的例子是ISO10646中的集群(我认为):

 & ensp; &安培;#8194; en space 
& emsp; &安培;#8195;时间空间
& thinsp; &安培;#8201;超薄空间
& zwnj; &安培;#8204;零宽度非连接器
& zwj; &安培;#8205;零宽度连接器
& lrm; &安培;#8206;从左到右标记
& rlm; &安培;#8207;从右到左标记

(出于显而易见的原因,字符不会出现在上面!)。

解决方案

Unicode将随着我们逐渐增加数量,如果一个HTML或XML文档是用UTF-8编码的Unicode编写的,那么你应该期望出现所有这些和所有这些文件。



在Unicode(Unicode字符数据库)中,以下代码点被定义为空白:

pre $ c $ U $ 0009-U + 000D(控制字符,包含Tab,CR和LF)
U + 0020 SPACE
U + 0085 NEL(控制字符下一行)
U + 00A0 NBSP(NO-BREAK SPACE)
U + 1680 OGHAM空格标记
U + 180E蒙古元音分离器
U + 2000-U + 200A(不同种类的空间)
U + 2028 LS(线分离器)
U + 2029 PS(分段分离器)
U + 202F NNBSP(窄窄无间隔)
U + 205F MMSP(中等数学空间)
U + 3000表意空间


When writing interpreters for PDF, HTML and other documents we need to deal with a variety of white-space characters and additional non-printing characters. The ANSI ones are well defined, but how many others are likely to be found in practice? A typical example is the cluster in ISO10646 (I think):

                     en space
                 em space
                   thin space
‌  ‌     ‌   ‌   zero width non-joiner
‍   ‍     ‍   ‍   zero width joiner
‎   ‎     ‎   ‎   left-to-right mark
‏   ‏     ‏   ‏   right-to-left mark

(For obvious reasons the characters do not appear above!).

解决方案

Unicode will be with us, in increasing quantity, for a long time. If an HTML or XML document is written in UTF-8 encoded Unicode, then you should expect any and all of these to appear.

In Unicode (Unicode Character Database) the following codepoints are defined as whitespace:

U+0009–U+000D (control characters, containing Tab, CR and LF)
U+0020 SPACE
U+0085 NEL (control character next line)
U+00A0 NBSP (NO-BREAK SPACE)
U+1680 OGHAM SPACE MARK
U+180E MONGOLIAN VOWEL SEPARATOR
U+2000–U+200A (different sorts of spaces)
U+2028 LS (LINE SEPARATOR)
U+2029 PS (PARAGRAPH SEPARATOR)
U+202F NNBSP (NARROW NO-BREAK SPACE)
U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
U+3000 IDEOGRAPHIC SPACE

这篇关于有多少非打印字符常用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆