Java中的4字节unicode字符 [英] 4 byte unicode character in Java

查看：69 发布时间：2021/5/18 19:30:11 java unicode

本文介绍了Java中的4字节unicode字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为我的自定义StringDatatype编写单元测试，并且需要写下4字节的unicode字符."\ U"-不起作用(非法转义字符错误)例如:U + 1F701(0xf0 0x9f 0x9c 0x81).如何将其写成字符串?

解决方案

Unicode代码点不是4个字节；它不是4个字节.它是一个整数(目前从U + 0000到U + 10FFFF).

您的4个字节是(很自然地)其UTF-8编码版本(；如果您的计算环境支持，则直接作为符号.

另请参见 CharsetDecoder 和 CharsetEncoder 类.

另请参见 String.codePointCount()，以及从Java 8开始的 String.codePoints()(继承自 CharSequence )./p>

I am writing unit tests for my custom StringDatatype, and I need to write down 4 byte unicode character. "\U" - not working (illegal escape character error) for example: U+1F701 (0xf0 0x9f 0x9c 0x81). How it can be written in a string?

解决方案

A Unicode code point is not 4 bytes; it is an integer (ranging, at the moment, from U+0000 to U+10FFFF).

Your 4 bytes are (wild guess) its UTF-8 encoding version (edit: I was right).

You need to do this:

final char[] chars = Character.toChars(0x1F701);
final String s = new String(chars);
final byte[] asBytes = s.getBytes(StandardCharsets.UTF_8);

When Java was created, Unicode did not define code points outside the BMP (ie, U+0000 to U+FFFF), which is the reason why a char is only 16 bits long (well, OK, this is only a guess, but I think I'm not far off the mark here); since then, well, it had to adapt... And code points outside the BMP need two chars (a leading surrogate and a trailing surrogate -- Java calls these a high and low surrogate respectively). There is no character literal in Java allowing to enter code points outside the BMP directly.

Given that a char is, in fact, a UTF-16 code unit and that there are string literals for these, you can input this "character" in a String as "\uD83D\uDF01" -- or directly as the symbol if your computing environment has support for it.

See also the CharsetDecoder and CharsetEncoder classes.

See also String.codePointCount(), and, since Java 8, String.codePoints() (inherited from CharSequence).

这篇关于Java中的4字节unicode字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java中的4字节unicode字符 [英] 4 byte unicode character in Java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java中的4字节unicode字符 [英] 4 byte unicode character in Java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭