字符串,字节[]和压缩 [英] String, byte[] and compression
问题描述
我们可以轻松地将 String
反汇编至 byte []
String s =my string;
byte [] b = s.getBytes();
System.out.println(new String(b)); //我的字符串
当涉及压缩时,似乎有一些问题。假设你有两个方法, compress
和 uncompress
(代码如下正常)
public static byte [] compress(String data)
throws UnsupportedEncodingException,IOException {
byte [] input = data.getBytes(UTF- 8);
Deflater df = new Deflater();
df.setLevel(Deflater.BEST_COMPRESSION);
df.setInput(input);
ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
df.finish();
byte [] buff = new byte [1024];
while(!df.finished()){
int count = df.deflate(buff);
baos.write(buff,0,count);
}
baos.close();
byte [] output = baos.toByteArray();
返回输出;
}
public static String uncompress(byte [] input)
throws UnsupportedEncodingException,IOException,
DataFormatException {
Inflater ifl = new Inflater();
ifl.setInput(input);
ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
byte [] buff = new byte [1024];
while(!ifl.finished()){
int count = ifl.inflate(buff);
baos.write(buff,0,count);
}
baos.close();
byte [] output = baos.toByteArray();
return new String(output);
}
我的测试工作如下(工作正常)
String text =some text;
byte [] bytes = Compressor.compress(text);
assertEquals(Compressor.uncompress(bytes),text); // works
没有其他原因,为什么不,我想修改第一个方法以返回 String
而不是 byte []。
<方法中的
So i return new String(output)
,并将我的测试修改为:
String text =some text;
String compressedText = Compressor.compress(text);
assertEquals(Compressor.uncompress(compressedText.getBytes),text); // failed
此测试失败, java.util.zip.DataFormatException:不正确标题检查
为什么?
String(byte [ ])
构造函数是问题。你不能简单地取任意字节,将它们转换为字符串,然后回到字节数组。 String
类根据所需的字符集对此字节
执行复杂的编码。如果给定字节序列不能被表示在Unicode中,它将被丢弃或转换为其他。从字节到 String
并回到 bytes
的转换是无损的,只有这些字节真的表示一些 String
(在某些编码中)。
这是一个最简单的例子:
new String(new byte [] { - 128},UTF-8)。getBytes(UTF-8)
pre>
上述返回
-17,-65,-67
,code>输入返回完全相同的输出。
We can disassemble
String
to and frombyte[]
easilyString s = "my string"; byte[] b = s.getBytes(); System.out.println(new String(b)); // my string
When compression is involved however there seem to be some issues. Suppose you have 2 methods,
compress
anduncompress
(code below works fine)public static byte[] compress(String data) throws UnsupportedEncodingException, IOException { byte[] input = data.getBytes("UTF-8"); Deflater df = new Deflater(); df.setLevel(Deflater.BEST_COMPRESSION); df.setInput(input); ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length); df.finish(); byte[] buff = new byte[1024]; while (!df.finished()) { int count = df.deflate(buff); baos.write(buff, 0, count); } baos.close(); byte[] output = baos.toByteArray(); return output; } public static String uncompress(byte[] input) throws UnsupportedEncodingException, IOException, DataFormatException { Inflater ifl = new Inflater(); ifl.setInput(input); ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length); byte[] buff = new byte[1024]; while (!ifl.finished()) { int count = ifl.inflate(buff); baos.write(buff, 0, count); } baos.close(); byte[] output = baos.toByteArray(); return new String(output); }
My Tests work as follows (works fine)
String text = "some text"; byte[] bytes = Compressor.compress(text); assertEquals(Compressor.uncompress(bytes), text); // works
For no reason other then, why not, i'd like to modify the first method to return a
String
instead of thebyte[].
So i
return new String(output)
from thecompress
method and modify my tests to:String text = "some text"; String compressedText = Compressor.compress(text); assertEquals(Compressor.uncompress(compressedText.getBytes), text); //fails
This test fails with
java.util.zip.DataFormatException: incorrect header check
Why is that? What needs to be done to make it work?
解决方案The
String(byte[])
constructor is the problem. You cannot simply take arbitrary bytes, convert them to a string and then back to byte array.String
class performs sophisticated encoding on thisbyte
based on desired charset. If given byte sequence can't be represented e.g. in Unicode it will be discarded or converted to something else. The conversion from bytes toString
and back tobytes
is lossless only if these bytes really represented someString
(in some encoding).Here is a simplest example:
new String(new byte[]{-128}, "UTF-8").getBytes("UTF-8")
The above returns
-17, -65, -67
while127
input returns the exact same output.这篇关于字符串,字节[]和压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!