String.codePointAt究竟做了什么? [英] What exactly does String.codePointAt do?

查看:120
本文介绍了String.codePointAt究竟做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我遇到了 codePointAt Java中String的方法。我还发现了一些其他 codePoint 方法: codePointBefore codePointCount 等他们肯定有与Unicode有关,但我不明白。

Recently I ran into codePointAt method of String in Java. I found also a few other codePoint methods: codePointBefore, codePointCount etc. They definitely have something to do with Unicode but I do not understand it.

现在我想知道何时以及如何使用 codePointAt 和类似的方法。

Now I wonder when and how one should use codePointAt and similar methods.

推荐答案

简答:它为您提供 Unicode代码点,从 String 中的指定索引处开始。即该位置角色的unicode number。

Short answer: it gives you the Unicode codepoint that starts at the specified index in String. i.e. the "unicode number" of the character at that position.

更长的答案: Java是在16位时创建的(又名 char )足以容纳任何存在的Unicode字符(这些部分现在称为基本多语言平面或BMP )。后来,Unicode扩展为包含代码点> 2 16 的字符。这意味着 char 无法再保存所有可能的Unicode代码点。

Longer answer: Java was created when 16 bit (aka a char) was enough to hold any Unicode character that existed (those parts are now known as the Basic Multilingual Plane or BMP). Later, Unicode was extended to include characters with a codepoint > 216. This means that a char could no longer hold all possible Unicode codepoints.

UTF-16 是解决方案:它以16位存储旧的Unicode代码点(即恰好一个 char )和32位的所有新的(即两个 char 值)。这两个16位值称为代理对。现在严格来说,一个 char 拥有一个UTF-16代码单元,而不是像过去那样拥有Unicode字符。

UTF-16 was the solution: it stores the "old" Unicode codepoints in 16 bit (i.e. exactly one char) and all the new ones in 32 bit (i.e. two char values). Those two 16 bit values are called a "surrogate pair". Now strictly speaking a char holds a "UTF-16 code unit" instead of "a Unicode character" as it used to.

现在所有旧方法(仅处理 char )只要你没有使用任何新Unicode字符就可以使用(或者并不真正关心它们),但是如果你也关心新角色(或者只是需要完整的Unicode支持),那么你需要使用实际<的支持所有可能的Unicode代码点。

Now all the "old" methods (handling only char) could be used just fine as long as you didn't use any of the "new" Unicode characters (or didn't really care about them), but if you cared about the new characters as well (or simply need to have complete Unicode support), then you'll need to use the "codepoint" versions that actually support all possible Unicode codepoints.

这篇关于String.codePointAt究竟做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆