UTF-8编码字符的最大字节数是多少? [英] What is the maximum number of bytes for a UTF-8 encoded character?

查看:968
本文介绍了UTF-8编码字符的最大字节数是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

单个UTF-8编码字符的最大字节数是多少?

What is the maximum number of bytes for a single UTF-8 encoded character?

我将加密以UTF-8编码的字符串,因此需要能够计算出UTF-8编码的字符串的最大字节数。

I'll be encrypting the bytes of a String encoded in UTF-8 and therefore need to be able to work out the maximum number of bytes for a UTF-8 encoded String.

有人可以确认单个UTF-8的最大字节数

Could someone confirm the maximum number of bytes for a single UTF-8 encoded character please

推荐答案

根据 RFC3629 ,其将字符表限制为 U + 10FFFF

The maximum number of bytes per character is 4 according to RFC3629 which limited the character table to U+10FFFF:

在UTF-8中,U + 0000..U + 10FFFF范围内的字符(UTF-16
可访问范围)使用1到4个字节的序列进行编码。 p>

In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.

(原始规范允许最多六个字节的字符代码超过 U + 10FFFF 。)

(The original specification allowed for up to six byte character codes for code points past U+10FFFF.)

代码小于128的字符只需要1个字节,接下来的1920个字符代码只需要2个字节。除非你使用深奥语言,否则将字符计数乘以4将是一个显着的高估。

Characters with a code less than 128 will require 1 byte only, and the next 1920 character codes require 2 bytes only. Unless you are working with an esoteric language, multiplying the character count by 4 will be a significant overestimation.

这篇关于UTF-8编码字符的最大字节数是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆