XML 声明编码 [英] XML declaration encoding

查看:36
本文介绍了XML 声明编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

它实际上是做什么的?在我对 XML 的非常基本的理解上,XML 只是一种格式化文本.所以不涉及二进制<->文本转换.

What does it actually do? On my very basic level of understanding XML is just a formatted text. So there is no binary<->text transformation involved.

我非常怀疑 UTF-8 和 ASCII 编码之间的唯一区别是,ASCII 编码将所有非 ASCII 字符转换为 XML 实体,而不仅仅是保留的 XML 字符,从而使 XML 编写器的工作更加困难.所以 ASCII 编码的 XML 仍然可以包含 UTF-8 字符,只是它会稍微长一点和丑一点.

I highly suspect that the only difference between UTF-8 and ASCII encoding is that ASCII encoding will make XML writer work harder by converting all the non-ASCII characters into XML entities as opposed to just reserved XML characters. So ASCII encoded XML can still contain UTF-8 characters, except it is going to be slightly longer and uglier.

或者它还有其他功能吗?

Or is there some other function to it?

更新:

我完全理解如何通过编码将单个字符转换为字节.然而,XML 只是文本标记,在任何时候都不会这样做.

I perfectly understand how individual characters are converted into byte(s) by means of encoding. However XML is just text markup and at no point does that.

问题真的是为什么 XML 编码值存储在 XML 中?或者 XML 阅读器需要知道任何特定 XML 文档使用哪种编码的情况是什么?

The question really is why XML encoding value is stored in the XML? Or what is the case where XML reader would need to know which encoding was used for any particular XML document?

推荐答案

请参阅 XML 规范中的附录 F,自动检测字符编码".

See Appendix F in the XML specification, "Autodetection of Character Encodings".

特别是,XML 编码值存储在 XML 中"是因为,默认情况下,XML 处理器必须假定内容是 UTF-16 或 UTF-8,而在 XML 文档之外找不到外部元数据.XML 声明专为不存在此类元数据的情况而设计.

In particular, "XML encoding value is stored in the XML" because, by default, XML processors must assume the content is in UTF-16 or UTF-8, in the absence of external metadata found outside of the XML document. The XML declaration is designed for such cases where such metadata is not present.

XML 处理编码方式的另一个优点是,通过这种方式,XML 处理器只需要支持两种编码,即 UTF-8 和 UTF-16.如果处理器发现,无论是在外部元数据还是在 XML 声明中,文档都处于编码中它不支持,如果它继续阅读文档(长在声明之后)并遇到意外的编码字节序列使用依赖于实现的启发式检测.

Another advantage to how XML handles encodings is that this way, an XML processor need support only two encodings, namely UTF-8 and UTF-16. If the processor discovers, either in external metadata or in the XML declaration, that the document is in an encoding it does not support, it can fail sooner than it would if it continues to read the document (long after the declaration) and encounters an unexpected byte sequence for the encoding detected using implementation-dependent heuristics.

这篇关于XML 声明编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆