为什么是“控制"?XML 1.0 中的字符非法? [英] Why are "control" characters illegal in XML 1.0?

查看:22
本文介绍了为什么是“控制"?XML 1.0 中的字符非法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 XML 1.0 中有多种不能合法编码的字符,例如U+0007 ('bell') 和 U+001B ('escape').大多数有趣的是非空白控制"字符.

There are a variety of characters that are not legally encodeable in XML 1.0, e.g. U+0007 ('bell') and U+001B ('escape'). Most of the interesting ones are non-whitespace 'control' characters.

从(例如)这个问题和其他人可以清楚地看出它是问题所在的 XML 规范——但任何人都可以照亮我至于为什么 XML 规范禁止这些字符?

It's clear from (e.g.) this question and others that it's the XML spec that's the issue -- but can anyone illuminate me as to why the XML spec forbids these characters?

似乎可能需要将它们编码为转义符,例如分别为  ,但也许有一个实际的原因是字符被禁止而不是被要求转义?

It seems like it could have been required that they be encoded in escapes, e.g. as  and  respectively, but perhaps there's a practical reason that the characters were forbidden rather than required to be escaped?

回答者建议避免传输控制字符有一些动机,但 Unicode 包含许多other 控制类字符(考虑 U+200C零宽度非连接符").我知道这种行为可能没有充分的理由,但我仍然想更好地理解它.

Answerers have suggested that there is some motivation towards avoiding transmission control characters, but Unicode includes many other control-like characters (consider U+200C "zero width non joiner"). I recognize there may be no good reason for this behavior, but I would still like to understand it better.

这尤其令人沮丧,因为当这些字符值出现在其他编码数据格式中时,我最终会双重转义"需要对其进行编码的新 XML 文档.

It's particularly frustrating because when those character values appear in other encodings data formats, I end up "double-escaping" new XML documents that need to encode this.

推荐答案

我的理解是这个范围是被禁止的,理由是标记语言不应该有任何需要支持传输和流控制字符,并且包含它们会创建一个二进制转换中任何编辑器和解析器的问题.

My understanding is that this range is barred on the grounds that a markup language should not have any need to support transmission and flow control characters and including them would create a problem for any editors and parsers in binary conversion.

不过,我正在努力从 Tim Bray 等人那里找到任何关于此的信息.

I'm struggling to find anything ex cathedra on this from Tim Bray et al though.

一些 讨论 控制字符和一个模糊的承认它并没有完全过度设计:

edit: some discussion of control chars and a vague admission it wasn't exactly over-engineered:

在 17/06/00 -0500 上午 09:27,马克·沃尔克曼写道:

At 09:27 AM 17/06/00 -0500, Mark Volkmann wrote:

我从来没有看到过关于大多数 ASCII 控制的原因的讨论XML 文档中不允许使用换页符等字符.能任何人都可以告诉我该决定背后的原因或向我指出规范.那解释一下?

I've never seen a discussion of the reason why most ASCII control characters, such as a form feed, are not allowed in XML documents. Can anyone tell me the reason behind that decision or point me to a spec. that explains that?

如果我们再次这样做,我不确定我们是否会以同样的方式这样做.一世不要看到他们造成任何真正的伤害.显然,如果您正在优化对于高度可互操作的内容标记语言(和 XML),它是对垂直制表符和退格键之类的东西持怀疑态度是合法的等等......但是如何保持 和DEL保持一致等等?-蒂姆

I'm not sure we'd do it the same way if we were doing it again. I don't see that they do any real harm. Clearly, if you're optimizing for a highly interoperable content markup language (and XML is) it's legitimate to be suspicious of things like vertical-tab and backspace and so on... but then how can it be consistent to leave in and DEL and so on? -Tim

这篇关于为什么是“控制"?XML 1.0 中的字符非法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆