为什么 XML 字符约束是不对称的? [英] Why XML Character constraint is asymmetric?
问题描述
我不清楚 XML 字符验证中不对称背后的逻辑.
Is not clear to me the logic behind the asymmerty in XML characters validation.
例如,后续的 XML 无效(如我所料):
For example the subsequent XMLs are not valid (as i expect) :
<xml>
<value attr="<">my value</value>
</xml>
<xml>
<value attr="attribute">my value is < than</value>
</xml>
但那些 XML 是有效的
But those XMLs are valid
<xml>
<value attr=">">my value</value>
</xml>
<xml>
<value attr="attribute">my value is > than</value>
</xml>
我期待的是任何字符都喜欢<>&
应始终被视为非法.所以我想问一下选择的原因是什么(> 很好,但 <不是).
What i'm expecting is that any characters like
<>&
should be always considered as illegal.
So i would like to ask which are the reason of that choice (> is fine but < is not).
推荐答案
编写语法规则是为了避免解析器必须提前扫描以正确解释字符.
The grammar rules were written to obviate the need for parsers to have to scan ahead to properly interpret characters.
<
和 >
的区别在于解析器在遇到 <
时无法知道它是标签的开始还是一个 LESS THAN 字符而不向前扫描,而当遇到 >
时,解析器根据它的扫描历史(无需提前扫描)知道它是否应该被解释为标签结束或大于字符.
The difference between <
and >
is that the parser upon encountering <
cannot know whether it's the start of a tag or a LESS THAN character without scanning forward, whereas when encountering >
, the parser knows based upon its scan history (without having to scan ahead) whether it should be interpreted as end of tag or a GREATER THAN character.
另见:
- 简化的 XML 转义
- Michael Kay 的有用的评论 关于 SGML 兼容性和规则统一性.
这篇关于为什么 XML 字符约束是不对称的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!