是一个Java字符数组总是有效的UTF-16（大端）编码？ [英] Is a Java char array always a valid UTF-16 (Big Endian) encoding?

查看：162 发布时间：2016/5/31 20:26:02 java arrays unicode character-encoding char

本文介绍了是一个Java字符数组总是有效的UTF-16（大端）编码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我会带code一个Java字符数组（的char [] ）实例作为字节：

Say that I would encode a Java character array (char[]) instance as bytes:

使用两个字节的每个字符

使用大端编码（存储在最左边的最显著8比特和至少显著8位中的最右字节）

这会始终创建一个有效的UTF-16BE编码？如果不是，这将$导致一个无效的编码C $ç点？

Would this always create a valid UTF-16BE encoding? If not, which code points will result in an invalid encoding?

这个问题是有很大关系的this关于Java的char类型和问题<一href=\"http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-re$p$psention-for-string-modified-utf-8-utf-16\">this关于Java的内部重新presentation问题弦乐器。

This question is very much related to this question about the Java char type and this question about the internal representation of Java strings.

推荐答案

没有。您可以创建包含任何你想要的16位值字符实例---没有什么限制他们成为有效的UTF-16 code单位，也约束他们的数组是一个有效的UTF-16序列。即使字符串不要求它的数据是有效的UTF-16：

No. You can create char instances that contain any 16-bit value you desire---there is nothing that constrains them to be valid UTF-16 code units, nor constrains an array of them to be a valid UTF-16 sequence. Even String does not require that its data be valid UTF-16:

char data[] = {'\uD800', 'b', 'c'};  // Unpaired lead surrogate
String str = new String(data);

有关有效的UTF-16数据的要求载于章单向code标准的3 （基本上，一切都必须是一个统一code标值，而所有的代理人必须正确配对）。您可以测试是否字符阵列是一个有效的UTF-16序列，并将其转化为UTF-16BE（或LE）字节序列，使用 CharsetEn codeR ：

The requirements for valid UTF-16 data are set out in Chapter 3 of the Unicode Standard (basically, everything must be a Unicode scalar value, and all surrogates must be correctly paired). You can test if a char array is a valid UTF-16 sequence, and turn it into a sequence of UTF-16BE (or LE) bytes, by using a CharsetEncoder:

CharsetEncoder encoder = Charset.forName("UTF-16BE").newEncoder();
ByteBuffer bytes = encoder.encode(CharBuffer.wrap(data)); // throws MalformedInputException

（同样地用一个 CharsetDe codeR 如果你有个字节）。

这篇关于是一个Java字符数组总是有效的UTF-16（大端）编码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是一个Java字符数组总是有效的UTF-16（大端）编码？ [英] Is a Java char array always a valid UTF-16 (Big Endian) encoding?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

是一个Java字符数组总是有效的UTF-16（大端）编码？ [英] Is a Java char array always a valid UTF-16 (Big Endian) encoding?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭