是否有Unicode编码,其中每个“字符"都包含在其中?只是一个代码点? [英] Is there encoding in Unicode where every "character" is just one code point?

查看:264
本文介绍了是否有Unicode编码,其中每个“字符"都包含在其中?只是一个代码点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试改写:您可以将每个组合字符组合映射到一个代码点吗?

Trying to rephrase: Can you map every combining character combination into one code point?

我是Unicode的新手,但在我看来,没有编码,规范化或表示形式,在每种情况下,Unicode中每个字符都是一个代码点.这样对吗?

I'm new to Unicode, but it seems to me that there is no encoding, normalization or representation where one character would be one code point in every case in Unicode. Is this correct?

Basic Multilingual Planes也是如此吗?

Is this true for Basic Multilingual Plane also?

推荐答案

如果您的意思是一个char ==一个数字(即,其中每个char都由相同数量的字节/字/您所拥有的数字表示):在UCS-4中,每个字符都由一个4字节的数字表示.对于每个字符都可以用单个值表示,这已经足够大了,但是如果您不需要任何更高的字符,这将非常浪费.

If you mean one char == one number (ie: where every char is represented by the same number of bytes/words/what-have-you): in UCS-4, each character is represented by a 4-byte number. That's way more than big enough for every character to be represented by a single value, but it's quite wasteful if you don't need any of the higher chars.

如果您是指兼容性序列(即:其中e +´ =>é):现有现代语言中使用的大多数组合都有单字符表示形式.如果您正在编写自己的语言,可能会遇到问题...但是如果您坚持使用人们实际使用的语言,那将会很好.

If you mean the compatibility sequences (ie: where e + ´ => é): there are single-character representations for most of the combinations in use in existing modern languages. If you're making up your own language, you could run into problems...but if you're sticking to the ones that people actually use, you'll be fine.

这篇关于是否有Unicode编码,其中每个“字符"都包含在其中?只是一个代码点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆