为什么将字节数组转换为字符串然后返回到字节数组时长度不同? [英] Why are the lengths different when converting a byte array to a String and then back to a byte array?

查看:102
本文介绍了为什么将字节数组转换为字符串然后返回到字节数组时长度不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下Java代码:

I have the following Java code:

byte[] signatureBytes = getSignature();

String signatureString = new String(signatureBytes, "UTF8");
byte[] signatureStringBytes = signatureString.getBytes("UTF8");

System.out.println(signatureBytes.length == signatureStringBytes.length); // prints false

Q:我可能误解了,但是我以为 new String(byte [] bytes,String charset) String.getBytes(charset)是反向操作?

Q: I'm probably misunderstanding this, but I thought that new String(byte[] bytes, String charset) and String.getBytes(charset) are inverse operations?

问:作为跟进,什么是以字符串方式传输byte []数组的安全方法?

Q: As a follow up, what is a safe way to transport a byte[] array as a String?

推荐答案

不是每个 byte [] 是有效的UTF-8。默认情况下,无效序列被一个固定的字符替换,我认为这是这样一个长度变化的原因。

Not every byte[] is valid UTF-8. By default invalid sequences gets replaced by a fixed character, and I think that's the reason for such a length change.

尝试拉丁语1,它不应该发生,因为它是每个 byte [] 的简单编码是有意义的。

Try Latin-1, it should not happen, as it's a simple encoding for which each byte[] is meaningful.

对于Windows-1252,无论如何都可以。在那里有未定义的序列(实际上是未定义的字节),但是所有的字符都被编码在单个字节中。新的字节[] 可能与原始的不同,但长度必须相同。

Neither for Windows-1252 should it happen. There are undefined sequences there (in fact undefined bytes), but all chars get encoded in a single byte. The new byte[] may differ from the original one, but their lengths must be the same.

这篇关于为什么将字节数组转换为字符串然后返回到字节数组时长度不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆