为什么UTF-32存在，只需要21位来编码每个字符？ [英] Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

查看：266 发布时间：2017/8/16 20:19:16 unicode encoding

本文介绍了为什么UTF-32存在，只需要21位来编码每个字符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们知道代码点可以在这个小于2 ^ 21的区间0..10FFFF。那么为什么我们需要UTF-32，当所有的代码点可以用3个字节表示？ UTF-24应该足够了。

解决方案

我可以想到的两个原因：

它允许将来扩展

（更重要的是）计算机在处理4字节边界上的数据时通常会更好。与在3字节边界上工作的痛苦相比，减少内存消耗的好处相对较小。

我猜这是有点像为什么我们经常有8位，16位，32位和64位整数数据类型（byte，int，long，whatever）而不是24位的。我确定有很多场合，我们知道一个数字永远不会超过2 ²¹，但使用 int 更简单创建一个24位类型。

We know that codepoints can be in this interval 0..10FFFF which is less than 2^21. Then why do we need UTF-32 when all codepoints can be represented by 3 bytes? UTF-24 should be enough.

解决方案

Two reasons I can think of:

It allows for future expansion
(More importantly) Computers are generally much better at dealing with data on 4 byte boundaries. The benefits in terms of reduced memory consumption are relatively small compared with the pain of working on 3-byte boundaries.

I guess this is a bit like asking why we often have 8-bit, 16-bit, 32-bit and 64-bit integer datatypes (byte, int, long, whatever) but not 24-bit ones. I'm sure there are lots of occasions where we know that a number will never go beyond 2²¹, but it's just simpler to use int than to create a 24-bit type.

这篇关于为什么UTF-32存在，只需要21位来编码每个字符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么UTF-32存在，只需要21位来编码每个字符？ [英] Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

为什么UTF-32存在，只需要21位来编码每个字符？ [英] Why UTF-32 exists whereas only 21 bits are necessary to encode every character?

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭