为什么UTF-32存在,只需要21位来编码每个字符? [英] Why UTF-32 exists whereas only 21 bits are necessary to encode every character?
本文介绍了为什么UTF-32存在,只需要21位来编码每个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
解决方案
我可以想到的两个原因:
- 它允许将来扩展
- (更重要的是)计算机在处理4字节边界上的数据时通常会更好。与在3字节边界上工作的痛苦相比,减少内存消耗的好处相对较小。
我猜这是有点像为什么我们经常有8位,16位,32位和64位整数数据类型(byte,int,long,whatever)而不是24位的。我确定有很多场合,我们知道一个数字永远不会超过2 21 ,但使用 int
更简单创建一个24位类型。
We know that codepoints can be in this interval 0..10FFFF which is less than 2^21. Then why do we need UTF-32 when all codepoints can be represented by 3 bytes? UTF-24 should be enough.
解决方案
Two reasons I can think of:
- It allows for future expansion
- (More importantly) Computers are generally much better at dealing with data on 4 byte boundaries. The benefits in terms of reduced memory consumption are relatively small compared with the pain of working on 3-byte boundaries.
I guess this is a bit like asking why we often have 8-bit, 16-bit, 32-bit and 64-bit integer datatypes (byte, int, long, whatever) but not 24-bit ones. I'm sure there are lots of occasions where we know that a number will never go beyond 221, but it's just simpler to use int
than to create a 24-bit type.
这篇关于为什么UTF-32存在,只需要21位来编码每个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文