UTF-8 编码字符的最大字节数是多少? [英] What is the maximum number of bytes for a UTF-8 encoded character?

查看:42
本文介绍了UTF-8 编码字符的最大字节数是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

单个 UTF-8 编码字符的最大字节数是多少?

What is the maximum number of bytes for a single UTF-8 encoded character?

我将对以 UTF-8 编码的字符串的字节进行加密,因此需要能够计算出以 UTF-8 编码的字符串的最大字节数.

I'll be encrypting the bytes of a String encoded in UTF-8 and therefore need to be able to work out the maximum number of bytes for a UTF-8 encoded String.

有人可以确认单个 UTF-8 编码字符的最大字节数吗

Could someone confirm the maximum number of bytes for a single UTF-8 encoded character please

推荐答案

根据 RFC3629 将字符表限制为U+10FFFF:

The maximum number of bytes per character is 4 according to RFC3629 which limited the character table to U+10FFFF:

在 UTF-8 中,U+0000..U+10FFFF 范围内的字符(UTF-16可访问范围)使用 1 到 4 个八位字节的序列进行编码.

In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.

(原始规范允许超过 U+10FFFF 的代码点最多使用六字节字符代码.)

(The original specification allowed for up to six byte character codes for code points past U+10FFFF.)

编码小于 128 的字符只需要 1 个字节,接下来的 1920 个字符编码只需要 2 个字节.除非您使用的是深奥的语言,否则将字符数乘以 4 将大大高估.

Characters with a code less than 128 will require 1 byte only, and the next 1920 character codes require 2 bytes only. Unless you are working with an esoteric language, multiplying the character count by 4 will be a significant overestimation.

这篇关于UTF-8 编码字符的最大字节数是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆