二进制代码究竟是如何转换成字母的? [英] How exactly does binary code get converted into letters?

查看:161
本文介绍了二进制代码究竟是如何转换成字母的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于好奇,二进制代码究竟是如何转换成字母的?我知道有些网站会自动为您将二进制转换为单词,但我想了解二进制代码在转换为字母之前所经历的具体中间步骤.

Out of curiosity, how exactly does binary code get converted into letters? I know there are sites that automatically convert binary to words for you but I wanna understand the specific, intermediary steps that binary code goes through before being converted into letters.

推荐答案

假设二进制代码"是指普通的旧数据(位或字节的序列),而字母"是指字符,则答案分两步.但首先,一些背景知识.

Assuming that by "binary code" you mean just plain old data (sequences of bits, or bytes), and that by "letters" you mean characters, the answer is in two steps. But first, some background.

  • 字符只是一个命名符号,例如LATIN CAPITAL LETTER A"或GREEK SMALL LETTER PI"或BLACK CHESS KNIGHT".不要将字符(抽象符号)与字形(字符的图片)混淆.
  • 字符集是一组特定的字符,每个字符都与一个特殊的数字相关联,称为其代码点.要查看 Unicode 字符集中的代码点映射,请参阅 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
  • A character is just a named symbol, like "LATIN CAPITAL LETTER A" or "GREEK SMALL LETTER PI" or "BLACK CHESS KNIGHT". Do not confuse a character (abstract symbol) with a glyph (a picture of a character).
  • A character set is a particular set of characters, each of which is associated with a special number, called its codepoint. To see the codepoint mappings in the Unicode character set, see http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.

好的,现在是两个步骤:

Okay now here are the two steps:

  1. 如果是文本数据,则必须以某种方式伴随着 字符编码,例如 UTF-8、Latin-1、US-ASCII 等.每种字符编码方案详细说明如何将字节序列解释为代码点(反之,如何将代码点编码为字节序列).

  1. The data, if it is textual, must be accompanied somehow by a character encoding, something like UTF-8, Latin-1, US-ASCII, etc. Each character encoding scheme specifies in great detail how byte sequences are interpreted as codepoints (and conversely how codepoints are encoded as byte sequences).

一旦字节序列被解释为代码点,你就有了你的字符,因为每个字符都有一个特定的代码点.

Once the byte sequences are interpreted as codepoints, you have your characters, because each character has a specific codepoint.

几个注意事项:

  • 在某些编码中,某些字节序列根本不对应任何代码点,因此您可能会出现字符解码错误.
  • 在某些字符集中,存在未使用的代码点,即它们根本不对应任何字符.

换句话说,并不是每个字节序列都意味着文本.

In other words, not every byte sequence means something as text.

这篇关于二进制代码究竟是如何转换成字母的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆