为什么是“控制"?XML 1.0 中的字符非法? [英] Why are "control" characters illegal in XML 1.0?
问题描述
在 XML 1.0 中有多种不能合法编码的字符,例如U+0007
('bell') 和 U+001B
('escape').大多数有趣的是非空白控制"字符.
There are a variety of characters that are not legally encodeable in XML 1.0, e.g. U+0007
('bell') and U+001B
('escape'). Most of the interesting ones are non-whitespace 'control' characters.
从(例如)这个问题和其他人可以清楚地看出它是问题所在的 XML 规范——但任何人都可以照亮我至于为什么 XML 规范禁止这些字符?
It's clear from (e.g.) this question and others that it's the XML spec that's the issue -- but can anyone illuminate me as to why the XML spec forbids these characters?
似乎可能需要将它们编码为转义符,例如分别为 
和 
,但也许有一个实际的原因是字符被禁止而不是被要求转义?
It seems like it could have been required that they be encoded in escapes, e.g. as 
and 
respectively, but perhaps there's a practical reason that the characters were forbidden rather than required to be escaped?
回答者建议避免传输控制字符有一些动机,但 Unicode 包含许多other 控制类字符(考虑 U+200C
零宽度非连接符").我知道这种行为可能没有充分的理由,但我仍然想更好地理解它.
Answerers have suggested that there is some motivation towards avoiding transmission control characters, but Unicode includes many other control-like characters (consider U+200C
"zero width non joiner"). I recognize there may be no good reason for this behavior, but I would still like to understand it better.
这尤其令人沮丧,因为当这些字符值出现在其他编码数据格式中时,我最终会双重转义"需要对其进行编码的新 XML 文档.
It's particularly frustrating because when those character values appear in other encodings data formats, I end up "double-escaping" new XML documents that need to encode this.
推荐答案
我的理解是这个范围是被禁止的,理由是标记语言不应该有任何需要支持传输和流控制字符,并且包含它们会创建一个二进制转换中任何编辑器和解析器的问题.
My understanding is that this range is barred on the grounds that a markup language should not have any need to support transmission and flow control characters and including them would create a problem for any editors and parsers in binary conversion.
不过,我正在努力从 Tim Bray 等人那里找到任何关于此的信息.
I'm struggling to find anything ex cathedra on this from Tim Bray et al though.
edit: some discussion of control chars and a vague admission it wasn't exactly over-engineered:
在 17/06/00 -0500 上午 09:27,马克·沃尔克曼写道:
At 09:27 AM 17/06/00 -0500, Mark Volkmann wrote:
我从来没有看到过关于大多数 ASCII 控制的原因的讨论XML 文档中不允许使用换页符等字符.能任何人都可以告诉我该决定背后的原因或向我指出规范.那解释一下?
I've never seen a discussion of the reason why most ASCII control characters, such as a form feed, are not allowed in XML documents. Can anyone tell me the reason behind that decision or point me to a spec. that explains that?
如果我们再次这样做,我不确定我们是否会以同样的方式这样做.一世不要看到他们造成任何真正的伤害.显然,如果您正在优化对于高度可互操作的内容标记语言(和 XML),它是对垂直制表符和退格键之类的东西持怀疑态度是合法的等等......但是如何保持 和DEL保持一致等等?-蒂姆
I'm not sure we'd do it the same way if we were doing it again. I don't see that they do any real harm. Clearly, if you're optimizing for a highly interoperable content markup language (and XML is) it's legitimate to be suspicious of things like vertical-tab and backspace and so on... but then how can it be consistent to leave in and DEL and so on? -Tim
这篇关于为什么是“控制"?XML 1.0 中的字符非法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!