为什么UTF-32存在,只需要21位来编码每个字符? [英] Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

查看:266
本文介绍了为什么UTF-32存在,只需要21位来编码每个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道代码点可以在这个小于2 ^ 21的区间0..10FFFF。那么为什么我们需要UTF-32,当所有的代码点可以用3个字节表示? UTF-24应该足够了。

解决方案

我可以想到的两个原因:




  • 它允许将来扩展

  • (更重要的是)计算机在处理4字节边界上的数据时通常会更好。与在3字节边界上工作的痛苦相比,减少内存消耗的好处相对较小。



我猜这是有点像为什么我们经常有8位,16位,32位和64位整数数据类型(byte,int,long,whatever)而不是24位的。我确定有很多场合,我们知道一个数字永远不会超过2 21 ,但使用 int 更简单创建一个24位类型。


We know that codepoints can be in this interval 0..10FFFF which is less than 2^21. Then why do we need UTF-32 when all codepoints can be represented by 3 bytes? UTF-24 should be enough.

解决方案

Two reasons I can think of:

  • It allows for future expansion
  • (More importantly) Computers are generally much better at dealing with data on 4 byte boundaries. The benefits in terms of reduced memory consumption are relatively small compared with the pain of working on 3-byte boundaries.

I guess this is a bit like asking why we often have 8-bit, 16-bit, 32-bit and 64-bit integer datatypes (byte, int, long, whatever) but not 24-bit ones. I'm sure there are lots of occasions where we know that a number will never go beyond 221, but it's just simpler to use int than to create a 24-bit type.

这篇关于为什么UTF-32存在,只需要21位来编码每个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆