Will String.getBytes(“UTF-16”)在所有平台上返回相同的结果? [英] Will String.getBytes("UTF-16") return the same result on all platforms?

查看:119
本文介绍了Will String.getBytes(“UTF-16”)在所有平台上返回相同的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从包含用户密码的String创建一个哈希。要创建哈希,我使用一个字节数组,我通过调用 String.getBytes()获取。但是当我使用指定的编码方式(如UTF-8)调用这种方法时,在不是默认编码的平台上,非ASCII字符被替换为默认字符(如果我正确地了解getBytes()的行为)因此在这样的平台上,我将得到一个不同的字节数组,最终得到一个不同的哈希。



由于字符串内部存储在UTF-16中,所以将调用 String.getBytes(UTF-16)保证我在每个平台上得到相同的字节数组,不管其默认编码如何?

解决方案

是的。它不仅保证是UTF-16,而且字节顺序也被定义


解码时,UTF-16字符集将解码字节顺序标记输入流的开始表示流的字节顺序,但如果没有字节顺序标记则默认为big-endian;当编码时,它使用大字节字节顺序并写入一个大字节的字节顺序标记。


(BOM不是当调用者不要求它时,相关,所以 String.getBytes(...)不会包含它。)



只要您具有相同的字符串内容 - 即相同的 char 值序列,那么您将在Java的每个实现上获得相同的字节,禁止错误。 (任何这样的错误都会非常令人惊讶,因为UTF-16可能是在Java中实现的最简单的编码...)



UTF-16是然而,对于 char (通常为 String )的本地表示方式仅在实现方面是相关的。例如,我 要在每个平台上给出相同的结果。 >

I need to create a hash from a String containing users password. To create the hash, I use a byte array which I get by calling String.getBytes(). But when I call this method with specified encoding, (such as UTF-8) on a platform where this is not the default encoding, the non-ASCII characters get replaced by a default character (if I understand the behaviour of getBytes() correctly) and therefore on such platform, I will get a different byte array, and eventually a different hash.

Since Strings are internally stored in UTF-16, will calling String.getBytes("UTF-16") guarantee me that I get the same byte array on every platform, regardless of its default encoding?

解决方案

Yes. Not only is it guaranteed to be UTF-16, but the byte order is defined too:

When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.

(The BOM isn't relevant when the caller doesn't ask for it, so String.getBytes(...) won't include it.)

So long as you have the same string content - i.e. the same sequence of char values - then you'll get the same bytes on every implementation of Java, barring bugs. (Any such bug would be pretty surprising, given that UTF-16 is probably the simplest encoding to implement in Java...)

The fact that UTF-16 is the native representation for char (and usually for String) is only relevant in terms of ease of implementation, however. For example, I'd also expect String.getBytes("UTF-8") to give the same results on every platform.

这篇关于Will String.getBytes(“UTF-16”)在所有平台上返回相同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆