Java 字符串的实例是否总是有效的 UTF-16? [英] Is an instance of a Java string always valid UTF-16?
问题描述
对于任何给定的 Java 字符串 s
,我想知道 s
表示的字符数组是否保证是有效的UTF-16 字符串,例如:
For any given Java String s
, I would like to know if the array of characters represented by s
is guaranteed to be a valid UTF-16 string, e.g.:
final char[] ch = new char[s.length()];
for (int i = 0; i < ch.length; ++i) {
ch[i] = s.charAt(i);
}
// Is ch guaranteed to be a valid UTF-16 encoded string?
如果不是,有哪些简单的 Java 语言测试用例会产生无效的 UTF-16?
If not, what are some simple Java-language test cases that produce invalid UTF-16?
编辑:有人将该问题标记为可能与 [Java char 数组是否总是有效的 UTF-16(Big Endian)编码? 我只能说,String
和 char[]
以及为什么前者可能(至少在理论上)对其内容有保证而后者没有的原因.我不是在问一个关于数组的问题,我是在问一个关于 String
s 的问题.
EDIT: Somebody has flagged the question as a possible duplicate of [Is a Java char array always a valid UTF-16 (Big Endian) encoding? All I can say is, there's a difference between a String
and a char[]
and a reason why the former might, at least theoretically, have guarantees as to its contents that the latter does not. I'm not asking a question about arrays, I'm asking a question about String
s.
推荐答案
没有.String
只是一个 char[]
的无限制包装器:
No. A String
is simply an unrestricted wrapper for a char[]
:
char data[] = {'\uD800', 'b', 'c'}; // Unpaired lead surrogate
String str = new String(data);
要测试 String
或 char[]
是否为格式良好的 UTF-16 数据,您可以使用 CharsetEncoder
:
To test a String
or char[]
for well-formed UTF-16 data, you can use CharsetEncoder
:
CharsetEncoder encoder = Charset.forName("UTF-16LE").newEncoder();
ByteBuffer bytes = encoder.encode(CharBuffer.wrap(str)); // throws MalformedInputException
这篇关于Java 字符串的实例是否总是有效的 UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!