二进制代码如何准确地转换为字母? [英] How exactly does binary code get converted into letters?

查看:1139
本文介绍了二进制代码如何准确地转换为字母?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于好奇,二进制代码如何准确地转换为字母?我知道有些网站会自动为您将二进制转换为单词,但我想了解二进制代码在转换为字母之前要经过的具体中间步骤.

Out of curiosity, how exactly does binary code get converted into letters? I know there are sites that automatically convert binary to words for you but I wanna understand the specific, intermediary steps that binary code goes through before being converted into letters.

推荐答案

假定二进制代码"仅表示普通的旧数据(位或字节序列),而字母"则表示字符,答案分两个步骤.但是首先,要有一些背景.

Assuming that by "binary code" you mean just plain old data (sequences of bits, or bytes), and that by "letters" you mean characters, the answer is in two steps. But first, some background.

  • 字符只是一个命名符号,例如拉丁文大写字母A"或希腊小写字母PI"或黑棋骑士".请勿将字符(抽象符号)与字形(字符的图片)混淆.
  • 字符集是一组特殊的字符,每个字符都与一个特殊的数字(称为其代码点)相关联.要查看Unicode字符集中的代码点映射,请参见 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt .
  • A character is just a named symbol, like "LATIN CAPITAL LETTER A" or "GREEK SMALL LETTER PI" or "BLACK CHESS KNIGHT". Do not confuse a character (abstract symbol) with a glyph (a picture of a character).
  • A character set is a particular set of characters, each of which is associated with a special number, called its codepoint. To see the codepoint mappings in the Unicode character set, see http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.

现在好,这是两个步骤:

Okay now here are the two steps:

  1. 如果数据是文本数据,则必须以某种方式随附字符编码,例如UTF-8,Latin-1,US-ASCII等.每种字符编码方案详细说明了如何将字节序列解释为代码点(以及相反如何将代码点编码为字节序列).

  1. The data, if it is textual, must be accompanied somehow by a character encoding, something like UTF-8, Latin-1, US-ASCII, etc. Each character encoding scheme specifies in great detail how byte sequences are interpreted as codepoints (and conversely how codepoints are encoded as byte sequences).

将字节序列解释为代码点后,您便拥有了字符,因为每个字符都有一个特定的代码点.

Once the byte sequences are interpreted as codepoints, you have your characters, because each character has a specific codepoint.

一些注意事项:

  • 在某些编码中,某些字节序列根本不对应任何代码点,因此您可能会遇到字符解码错误.
  • 在某些字符集中,有一些未使用的代码点,也就是说,它们根本不对应任何字符.

换句话说,并非每个字节序列都意味着文本.

In other words, not every byte sequence means something as text.

这篇关于二进制代码如何准确地转换为字母?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆