ASCII对Unicode + UTF-8 [英] ASCII vs Unicode + UTF-8

查看:170
本文介绍了ASCII对Unicode + UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读Joel Spolsky的关于字符编码的绝对最小值。
我的理解是,ASCII是一个码点+编码方案,在现代,我们使用Unicode作为码点方案和UTF-8作为编码方案。这是正确的吗?

Was reading Joel Spolsky's 'The Absolute Minimum' about character encoding. It is my understanding that ASCII is a Code-point + Encoding scheme, and in modern times, we use Unicode as the Code-point scheme and UTF-8 as the Encoding scheme. Is this correct?

推荐答案

是的,除了UTF-8是 编码方案。其他编码方案包括UTF-16(具有两个不同的字节顺序)和UTF-32。 (对于某些混乱,在Microsoft软件中将UTF-16方案称为Unicode。)

Yes, except that UTF-8 is an encoding scheme. Other encoding schemes include UTF-16 (with two different byte orders) and UTF-32. (For some confusion, a UTF-16 scheme is called "Unicode" in Microsoft software.)

而且,确切地说,定义ASCII的美国国家标准字符集合及其编码为7位数量,而不指定以字节为单位的特定传输编码。在过去,它以不同的方式使用,例如。以便将5个ASCII字符打包到一个36位存储单元中,或者使8位字节使用额外字节进行检查(奇偶校验位)或传输控制。但是现在使用ASCII,使得一个ASCII字符被编码为一个8位字节,其中第一位被设置为零。这是事实上的标准编码方案,并且暗示在大量规范中,但严格来说不是ASCII标准的一部分。

And, to be exact, the American National Standard that defines ASCII specifies a collection of characters and their coding as 7-bit quantities, without specifying a particular transfer encoding in terms of bytes. In the past, it was used in different ways, e.g. so that five ASCII characters were packed into one 36-bit storage unit or so that 8-bit bytes used the extra bytes for checking purposes (parity bit) or for transfer control. But nowadays ASCII is used so that one ASCII character is encoded as one 8-bit byte with the first bit set to zero. This is the de facto standard encoding scheme and implied in a large number of specifications, but strictly speaking not part of the ASCII standard.

这篇关于ASCII对Unicode + UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆