Java 字符串的实例是否总是有效的 UTF-16? [英] Is an instance of a Java string always valid UTF-16?

查看:32
本文介绍了Java 字符串的实例是否总是有效的 UTF-16?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于任何给定的 Java 字符串 s,我想知道 s 表示的字符数组是否保证是有效的UTF-16 字符串,例如:

For any given Java String s, I would like to know if the array of characters represented by s is guaranteed to be a valid UTF-16 string, e.g.:

final char[] ch = new char[s.length()];
for (int i = 0; i < ch.length; ++i) {
    ch[i] = s.charAt(i);
}
// Is ch guaranteed to be a valid UTF-16 encoded string?

如果不是,有哪些简单的 Java 语言测试用例会产生无效的 UTF-16?

If not, what are some simple Java-language test cases that produce invalid UTF-16?

编辑:有人将该问题标记为可能与 [Java char 数组是否总是有效的 UTF-16(Big Endian)编码? 我只能说,Stringchar[] 以及为什么前者可能(至少在理论上)对其内容有保证而后者没有的原因.我不是在问一个关于数组的问题,我是在问一个关于 Strings 的问题.

EDIT: Somebody has flagged the question as a possible duplicate of [Is a Java char array always a valid UTF-16 (Big Endian) encoding? All I can say is, there's a difference between a String and a char[] and a reason why the former might, at least theoretically, have guarantees as to its contents that the latter does not. I'm not asking a question about arrays, I'm asking a question about Strings.

推荐答案

没有.String 只是一个 char[] 的无限制包装器:

No. A String is simply an unrestricted wrapper for a char[]:

char data[] = {'\uD800', 'b', 'c'};  // Unpaired lead surrogate
String str = new String(data);

要测试 Stringchar[] 是否为格式良好的 UTF-16 数据,您可以使用 CharsetEncoder:

To test a String or char[] for well-formed UTF-16 data, you can use CharsetEncoder:

CharsetEncoder encoder = Charset.forName("UTF-16LE").newEncoder();
ByteBuffer bytes = encoder.encode(CharBuffer.wrap(str)); // throws MalformedInputException

这篇关于Java 字符串的实例是否总是有效的 UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆