UTF-8 可以编码多少个字符? [英] How many characters can UTF-8 encode?

查看:42
本文介绍了UTF-8 可以编码多少个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果 UTF-8 是 8 位,是不是意味着最多只能有 256 个不同的字符?

If UTF-8 is 8 bits, does it not mean that there can be only maximum of 256 different characters?

前 128 个码位与 ASCII 相同.但它说 UTF-8 最多可以支持百万个字符?

The first 128 code points are the same as in ASCII. But it says UTF-8 can support up to million of characters?

这是如何工作的?

推荐答案

UTF-8 并不是一直使用 1 个字节,而是 1 到 4 个字节.

UTF-8 does not use one byte all the time, it's 1 to 4 bytes.

前 128 个字符 (US-ASCII) 需要一个字节.

The first 128 characters (US-ASCII) need one byte.

接下来的 1,920 个字符需要两个字节来编码.这涵盖了几乎所有拉丁字母的其余部分,以及希腊字母、西里尔字母、科普特字母、亚美尼亚字母、希伯来字母、阿拉伯字母、叙利亚字母和塔纳字母,以及组合变音符号.

The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.

基本多语言平面其余部分的字符需要三个字节,其中包含几乎所有常用字符[12],包括大多数中文、日文和韩文 [CJK] 字符.

Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[12] including most Chinese, Japanese and Korean [CJK] characters.

Unicode 其他平面的字符需要四个字节,包括不太常见的 CJK 字符、各种历史脚本、数学符号和 emoji(象形符号).

Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

来源:维基百科

这篇关于UTF-8 可以编码多少个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆