是ASCII码7位还是8位? [英] Is ASCII code 7-bit or 8-bit?

查看:1609
本文介绍了是ASCII码7位还是8位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的老师告诉我,ASCII是8位字符编码方案。但它只定义0-127代码,这意味着它可以适合7位。所以不能说ASCII位实际上是7位代码?



当我们说ASCII是8位代码时,我们是什么意思


解决方案

ASCII < a>最初被设想为7位代码。这在8位字节变得普遍存在之前就做得很好,甚至到20世纪90年代,你可以找到软件,它假设它可以使用每个字节的第8位用于自己的目的(不是8位清除)。现在人们认为它是一个8位编码,其中字节0x80到0xFF没有定义的意义,但这是一个 retcon



有几十种文本编码使用第8位;它们可以分类为ASCII兼容或不兼容,以及固定或可变宽度。 ASCII兼容意味着不管上下文 ,具有从0x00到0x7F的值的单字节编码与在ASCII中相同的字符。你不想与一个非ASCII兼容的文本编码有任何关系,如果你可以避免它;天真的程序期望ASCII倾向于在灾难性的,通常安全破坏的方式误解他们。他们现在已经不赞成使用HTML5(例如)禁止在公共网络上使用HTML5,但不幸的是 UTF-16 。我不会再谈论他们了。



固定宽度编码意味着它听起来像:所有字符使用相同的字节数编码。要与ASCII兼容,固定编码必须使用一个字节对所有字符进行编码,因此它不能超过256个字符。现在最常见的这种编码是 Windows-1252 ,扩展名为 ISO 8859-1



只有一个可变宽度的ASCII兼容编码值得了解但非常重要的是: UTF-8 ,它将所有的Unicode包装成与ASCII兼容的编码。



作为最后一点,现在的ASCII从Unicode的实用定义,而不是其原始标准(ANSI X3.4-1968),因为在历史上,有几十个变体在ASCII 127字符汇编 - 例如,一些标点符号可以替换为重音字母,以方便法语文本的传输。现在所有这些变化都是过时的,当人们说ASCII时,它们意味着具有值0x00到0x7F的字节编码Unicode码点U + 0000到U + 007F。如果你发现自己正在写一个技术标准,这可能只对你很重要。


My teacher told me ASCII is 8-bit character coding scheme. But it is defined only for 0-127 codes which means it can be fit into 7-bits. So can't it be argued that ASCII bit is actually 7-bit code?

And what do we mean to say at all when saying ASCII is 8-bit code at all?

解决方案

ASCII was indeed originally conceived as a 7-bit code. This was done well before 8-bit bytes became ubiquitous, and even into the 1990s you could find software that assumed it could use the 8th bit of each byte of text for its own purposes ("not 8-bit clean"). Nowadays people think of it as an 8-bit coding in which bytes 0x80 through 0xFF have no defined meaning, but that's a retcon.

There are dozens of text encodings that make use of the 8th bit; they can be classified as ASCII-compatible or not, and fixed- or variable-width. ASCII-compatible means that regardless of context, single bytes with values from 0x00 through 0x7F encode the same characters that they would in ASCII. You don't want to have anything to do with a non-ASCII-compatible text encoding if you can possibly avoid it; naive programs expecting ASCII tend to misinterpret them in catastrophic, often security-breaking fashion. They are so deprecated nowadays that (for instance) HTML5 forbids their use on the public Web, with the unfortunate exception of UTF-16. I'm not going to talk about them any more.

A fixed-width encoding means what it sounds like: all characters are encoded using the same number of bytes. To be ASCII-compatible, a fixed-with encoding must encode all its characters using only one byte, so it can have no more than 256 characters. The most common such encoding nowadays is Windows-1252, an extension of ISO 8859-1.

There's only one variable-width ASCII-compatible encoding worth knowing about nowadays, but it's very important: UTF-8, which packs all of Unicode into an ASCII-compatible encoding. You really want to be using this if you can manage it.

As a final note, "ASCII" nowadays takes its practical definition from Unicode, not its original standard (ANSI X3.4-1968), because historically there were several dozen variations on the ASCII 127-character repertoire -- for instance, some of the punctuation might be replaced with accented letters to facilitate the transmission of French text. Nowadays all of those variations are obsolescent, and when people say "ASCII" they mean that the bytes with value 0x00 through 0x7F encode Unicode codepoints U+0000 through U+007F. This will probably only matter to you if you ever find yourself writing a technical standard.

这篇关于是ASCII码7位还是8位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆