ASCII-代码点与字符编码 [英] ASCII - code point vs. character encoding

查看：52 发布时间：2021/4/10 18:36:08 encoding character-encoding ascii

本文介绍了ASCII-代码点与字符编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现了一篇有趣的文章有关字符代码问题的教程"( http://jkorpela.fi/chars.html#code )，它解释了术语字符代码"/代码点"和字符编码".

I found an interesting article "A tutorial on character code issues" (http://jkorpela.fi/chars.html#code) which explains the terms "character code"/"code point" and "character encoding".

前者只是分配给字符的整数.例如从65到字符A.字符编码定义了如何通过一个或多个字节来表示这样的代码点.

The former is just an integer number which is assigned to an character. For example 65 to character A. The character encoding defines how such an code point is represented via one ore more bytes.

对于旧的ASCII，指导员说:"ASCII标准指定的字符编码非常简单，对于任何编码不超过255的字符代码，最明显的编码是:每个编码都表示为具有相同值的八位字节."

For the good old ASCII the autor says: "The character encoding specified by the ASCII standard is very simple, and the most obvious one for any character code where the code numbers do not exceed 255: each code number is presented as an octet with the same value. "

因此，A的代码点65将被编码为10000001.

So 65 which is the code point for A would be encoded as 1000 0001.

因为我有127个ASCII字符，所以有127个代码点，每个代码点始终由一个字节编码.

Because I have 127 characters in ASCII there are 127 code points where each code point is always encoded by one byte.

如果我对此进行总结，则可以执行以下步骤以ASCII编码字符:

If I summarize this I have the following steps to encode characters in ASCII:

为每个字符分配一个数字(代码点)(例如A-> 65)
使用具有相同值(例如1000 0001)的字节对字符进行编码

所以对于字母A和B来说应该是

So for the letter A and B it would be

A-> 65-> 1000 0001B-> 66-> 1000 0010

A -> 65 -> 1000 0001 B -> 66 -> 1000 0010

我的问题是:

为什么要分离编码点和ASCII编码?ASCII只有一种编码.因此，至少对于ASCII，我不清楚为什么要执行中间步骤(映射到整数).像

Why this separation of code points and encoding in ASCII? ASCII has only one encoding. So at least for ASCII it is not clear for me why the intermediate step (map to integer) is done. A direct encoding like

A-> 1000 0001B-> 1000 0010

A -> 1000 0001 B -> 1000 0010

是否也可能?如果我对ASCII字符有多种编码，则分隔是合理的，但仅采用一种编码形式对我来说就没有意义.

would also be possible or not? If I would have multiple encodings for an ASCII character the separation would be reasonable but with only one encoding form it doesn't make sense for me.

ASCII-代码点与字符编码 [英] ASCII - code point vs. character encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ASCII-代码点与字符编码 [英] ASCII - code point vs. character encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭