Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？ [英] Will String.getBytes("UTF-16") return the same result on all platforms?

查看：119 发布时间：2017/8/17 0:36:34 java string encoding

本文介绍了Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从包含用户密码的String创建一个哈希。要创建哈希，我使用一个字节数组，我通过调用 String.getBytes（）获取。但是当我使用指定的编码方式（如UTF-8）调用这种方法时，在不是默认编码的平台上，非ASCII字符被替换为默认字符（如果我正确地了解getBytes（）的行为）因此在这样的平台上，我将得到一个不同的字节数组，最终得到一个不同的哈希。

由于字符串内部存储在UTF-16中，所以将调用 String.getBytes（UTF-16）保证我在每个平台上得到相同的字节数组，不管其默认编码如何？

解决方案

是的。它不仅保证是UTF-16，而且字节顺序也被定义：

解码时，UTF-16字符集将解码字节顺序标记输入流的开始表示流的字节顺序，但如果没有字节顺序标记则默认为big-endian;当编码时，它使用大字节字节顺序并写入一个大字节的字节顺序标记。

（BOM不是当调用者不要求它时，相关，所以 String.getBytes（...）不会包含它。）

只要您具有相同的字符串内容 - 即相同的 char 值序列，那么您将在Java的每个实现上获得相同的字节，禁止错误。（任何这样的错误都会非常令人惊讶，因为UTF-16可能是在Java中实现的最简单的编码...）

UTF-16是然而，对于 char （通常为 String ）的本地表示方式仅在实现方面是相关的。例如，我还要在每个平台上给出相同的结果。 >

I need to create a hash from a String containing users password. To create the hash, I use a byte array which I get by calling String.getBytes(). But when I call this method with specified encoding, (such as UTF-8) on a platform where this is not the default encoding, the non-ASCII characters get replaced by a default character (if I understand the behaviour of getBytes() correctly) and therefore on such platform, I will get a different byte array, and eventually a different hash.

Since Strings are internally stored in UTF-16, will calling String.getBytes("UTF-16") guarantee me that I get the same byte array on every platform, regardless of its default encoding?

解决方案

Yes. Not only is it guaranteed to be UTF-16, but the byte order is defined too:

When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.

(The BOM isn't relevant when the caller doesn't ask for it, so String.getBytes(...) won't include it.)

So long as you have the same string content - i.e. the same sequence of char values - then you'll get the same bytes on every implementation of Java, barring bugs. (Any such bug would be pretty surprising, given that UTF-16 is probably the simplest encoding to implement in Java...)

The fact that UTF-16 is the native representation for char (and usually for String) is only relevant in terms of ease of implementation, however. For example, I'd also expect String.getBytes("UTF-8") to give the same results on every platform.

这篇关于Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？ [英] Will String.getBytes("UTF-16") return the same result on all platforms?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？ [英] Will String.getBytes(&quot;UTF-16&quot;) return the same result on all platforms?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

Will String.getBytes（“UTF-16”）在所有平台上返回相同的结果？ [英] Will String.getBytes("UTF-16") return the same result on all platforms?

登录关闭