Will String.getBytes(“UTF-16”)在所有平台上返回相同的结果? [英] Will String.getBytes("UTF-16") return the same result on all platforms?
问题描述
我需要从包含用户密码的String创建一个哈希。要创建哈希,我使用一个字节数组,我通过调用 String.getBytes()
获取。但是当我使用指定的编码方式(如UTF-8)调用这种方法时,在不是默认编码的平台上,非ASCII字符被替换为默认字符(如果我正确地了解getBytes()的行为)因此在这样的平台上,我将得到一个不同的字节数组,最终得到一个不同的哈希。
由于字符串内部存储在UTF-16中,所以将调用 String.getBytes(UTF-16)
保证我在每个平台上得到相同的字节数组,不管其默认编码如何?
是的。它不仅保证是UTF-16,而且字节顺序也被定义:
解码时,UTF-16字符集将解码字节顺序标记输入流的开始表示流的字节顺序,但如果没有字节顺序标记则默认为big-endian;当编码时,它使用大字节字节顺序并写入一个大字节的字节顺序标记。
(BOM不是当调用者不要求它时,相关,所以 String.getBytes(...)
不会包含它。)
只要您具有相同的字符串内容 - 即相同的 char
值序列,那么您将在Java的每个实现上获得相同的字节,禁止错误。 (任何这样的错误都会非常令人惊讶,因为UTF-16可能是在Java中实现的最简单的编码...)
UTF-16是然而,对于 char
(通常为 String
)的本地表示方式仅在实现方面是相关的。例如,我还 要在每个平台上给出相同的结果。 >
I need to create a hash from a String containing users password. To create the hash, I use a byte array which I get by calling String.getBytes()
. But when I call this method with specified encoding, (such as UTF-8) on a platform where this is not the default encoding, the non-ASCII characters get replaced by a default character (if I understand the behaviour of getBytes() correctly) and therefore on such platform, I will get a different byte array, and eventually a different hash.
Since Strings are internally stored in UTF-16, will calling String.getBytes("UTF-16")
guarantee me that I get the same byte array on every platform, regardless of its default encoding?
Yes. Not only is it guaranteed to be UTF-16, but the byte order is defined too:
When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.
(The BOM isn't relevant when the caller doesn't ask for it, so String.getBytes(...)
won't include it.)
So long as you have the same string content - i.e. the same sequence of char
values - then you'll get the same bytes on every implementation of Java, barring bugs. (Any such bug would be pretty surprising, given that UTF-16 is probably the simplest encoding to implement in Java...)
The fact that UTF-16 is the native representation for char
(and usually for String
) is only relevant in terms of ease of implementation, however. For example, I'd also expect String.getBytes("UTF-8")
to give the same results on every platform.
这篇关于Will String.getBytes(“UTF-16”)在所有平台上返回相同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!