- <?xml version="1.0" 的含义编码=“utf-8"?> [英] Meaning of - <?xml version="1.0" encoding="utf-8"?>

查看:23
本文介绍了- <?xml version="1.0" 的含义编码=“utf-8"?>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 XML 的新手,我正在尝试了解基础知识.我在Learning XML"中阅读了下面的行,但对我来说仍然不清楚.有人能给我指点一本清楚地解释这些基础知识的书或网站吗?

I am new to XML and I am trying to understand the basics. I read the line below in "Learning XML", but it is still not clear, for me. Can someone point me to a book or website which explains these basics clearly?

来自学习 XML:

XML 声明描述了一些最通用的属性文档,告诉 XML 处理器它需要一个 XML 解析器来解读这份文件.

The XML declaration describes some of the most general properties of the document, telling the XML processor that it needs an XML parser to interpret this document.

这是什么意思?

我理解 xml version 部分 - 文档和文档用户都应该在相同版本的 XML 中交谈".但是 encoding 部分呢?为什么有必要?

I understand the xml version part - both doc and user of doc should "talk" in the same version of XML. But what about the encoding part? Why is that necessary?

推荐答案

了解编码"属性,你必须了解字节字符之间的区别.

To understand the "encoding" attribute, you have to understand the difference between bytes and characters.

将字节视为 0 到 255 之间的数字,而字符则是诸如a"、1"之类的东西.和Ä".所有可用字符的集合称为字符集.

Think of bytes as numbers between 0 and 255, whereas characters are things like "a", "1" and "Ä". The set of all characters that are available is called a character set.

每个字符都有一个或多个字节的序列,用于表示它;但是,字节的确切数量和值取决于所使用的编码,并且有许多不同的编码.

Each character has a sequence of one or more bytes that are used to represent it; however, the exact number and value of the bytes depends on the encoding used and there are many different encodings.

大多数编码基于旧的字符集和称为 ASCII 的编码,每个字符一个字节(实际上只有 7 位),包含 128 个字符,其中包括许多美国英语中使用的常用字符.

Most encodings are based on an old character set and encoding called ASCII which is a single byte per character (actually, only 7 bits) and contains 128 characters including a lot of the common characters used in US English.

例如,这里是 ASCII 字符集中的 6 个字符,由值 60 到 65 表示.

For example, here are 6 characters in the ASCII character set that are represented by the values 60 to 65.

Extract of ASCII Table 60-65
╔══════╦══════════════╗
║ Byte ║  Character   ║
╠══════╬══════════════║
║  60  ║      <       ║
║  61  ║      =       ║
║  62  ║      >       ║
║  63  ║      ?       ║
║  64  ║      @       ║
║  65  ║      A       ║
╚══════╩══════════════╝

在完整的 ASCII 集中,使用的最低值是 0,最高是 127(这两个都是隐藏的控制字符).

In the full ASCII set, the lowest value used is zero and the highest is 127 (both of these are hidden control characters).

但是,一旦您开始需要比基本 ASCII 提供的字符更多的字符(例如,带有重音符号、货币符号、图形符号等的字母),ASCII 就不适合了,您需要更广泛的字符.您需要更多字符(不同的字符集)并且需要不同的编码,因为 128 个字符不足以容纳所有字符.某些编码提供一个字节(256 个字符)或最多六个字节.

However, once you start needing more characters than the basic ASCII provides (for example, letters with accents, currency symbols, graphic symbols, etc.), ASCII is not suitable and you need something more extensive. You need more characters (a different character set) and you need a different encoding as 128 characters is not enough to fit all the characters in. Some encodings offer one byte (256 characters) or up to six bytes.

随着时间的推移,已经创建了很多编码.在 Windows 世界中,有 CP1252 或 ISO-8859-1,而 Linux 用户则倾向于使用 UTF-8.Java 本机使用 UTF-16 [见评论].

Over time a lot of encodings have been created. In the Windows world, there is CP1252, or ISO-8859-1, whereas Linux users tend to favour UTF-8. Java uses UTF-16 natively [see comments].

一个字符在一种编码中的一个字节值序列可能代表另一种编码中完全不同的字符,甚至可能是无效的.

One sequence of byte values for a character in one encoding might stand for a completely different character in another encoding, or might even be invalid.

例如,在ISO 8859-1中,â由一个字节的值226表示,而在UTF-8 是两个字节:195, 162.但是,在 ISO 8859-1 中,195, 162 将是两个字符,Ã, ¢.

For example, in ISO 8859-1, â is represented by one byte of value 226, whereas in UTF-8 it is two bytes: 195, 162. However, in ISO 8859-1, 195, 162 would be two characters, Ã, ¢.

不要将 XML 视为字符序列,而是将其视为字节序列.

Think of XML as not a sequence of characters but a sequence of bytes.

想象一下接收 XML 的系统看到字节 195, 162.它怎么知道这些是什么字符?

Imagine the system receiving the XML sees the bytes 195, 162. How does it know what characters these are?

为了让系统将这些字节解释为实际字符(并因此显示它们或将它们转换为另一种编码),它需要知道 XML 中使用的编码.

In order for the system to interpret those bytes as actual characters (and so display them or convert them to another encoding), it needs to know the encoding used in the XML.

由于大多数常见的编码都与 ASCII 兼容,就基本字母字符和符号而言,在这些情况下,声明本身可以避免仅使用 ASCII 字符来说明编码是什么.在其他情况下,解析器必须尝试找出声明的编码.因为它知道声明以 <?xml 开头,所以这样做要容易得多.

Since most common encodings are compatible with ASCII, as far as basic alphabetic characters and symbols go, in these cases, the declaration itself can get away with using only the ASCII characters to say what the encoding is. In other cases, the parser must try and figure out the encoding of the declaration. Since it knows the declaration begins with <?xml it is a lot easier to do this.

最后,version 属性指定了 XML 版本,目前有两个(参见 维基百科 XML 版本.版本之间存在细微差别,因此 XML 解析器需要知道它在处理什么.在大多数情况下(无论如何对于英语使用者),1.0 版是足够了.

Finally, the version attribute specifies the XML version, of which there are two at the moment (see Wikipedia XML versions. There are slight differences between the versions, so an XML parser needs to know what it is dealing with. In most cases (for English speakers anyway), version 1.0 is sufficient.

这篇关于- &lt;?xml version="1.0" 的含义编码=“utf-8"?>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆