字符串,字节[]和压缩 [英] String, byte[] and compression

查看:458
本文介绍了字符串,字节[]和压缩的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们可以轻松地将 String 反汇编至 byte []

  String s =my string; 
byte [] b = s.getBytes();
System.out.println(new String(b)); //我的字符串

当涉及压缩时,似乎有一些问题。假设你有两个方法, compress uncompress (代码如下正常)

  public static byte [] compress(String data)
throws UnsupportedEncodingException,IOException {
byte [] input = data.getBytes(UTF- 8);
Deflater df = new Deflater();
df.setLevel(Deflater.BEST_COMPRESSION);
df.setInput(input);

ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
df.finish();
byte [] buff = new byte [1024];
while(!df.finished()){
int count = df.deflate(buff);
baos.write(buff,0,count);
}
baos.close();
byte [] output = baos.toByteArray();

返回输出;
}

public static String uncompress(byte [] input)
throws UnsupportedEncodingException,IOException,
DataFormatException {
Inflater ifl = new Inflater();
ifl.setInput(input);

ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
byte [] buff = new byte [1024];
while(!ifl.finished()){
int count = ifl.inflate(buff);
baos.write(buff,0,count);
}
baos.close();
byte [] output = baos.toByteArray();

return new String(output);
}

我的测试工作如下(工作正常)

  String text =some text; 
byte [] bytes = Compressor.compress(text);
assertEquals(Compressor.uncompress(bytes),text); // works

没有其他原因,为什么不,我想修改第一个方法以返回 String 而不是 byte []。



<方法中的

So i return new String(output),并将我的测试修改为:

  String text =some text; 
String compressedText = Compressor.compress(text);
assertEquals(Compressor.uncompress(compressedText.getBytes),text); // failed

此测试失败, java.util.zip.DataFormatException:不正确标题检查



为什么?

解决方案

String(byte [ ])构造函数是问题。你不能简单地取任意字节,将它们转换为字符串,然后回到字节数组。 String 类根据所需的字符集对此字节执行复杂的编码。如果给定字节序列不能被表示在Unicode中,它将被丢弃或转换为其他。从字节到 String 并回到 bytes 的转换是无损的,只有这些字节真的表示一些 String (在某些编码中)。



这是一个最简单的例子:

  new String(new byte [] { -  128},UTF-8)。getBytes(UTF-8)
pre>

上述返回 -17,-65,-67 code>输入返回完全相同的输出。


We can disassemble String to and from byte[] easily

        String s = "my string";
        byte[] b = s.getBytes();
        System.out.println(new String(b)); // my string

When compression is involved however there seem to be some issues. Suppose you have 2 methods, compress and uncompress (code below works fine)

public static byte[] compress(String data) 
             throws UnsupportedEncodingException, IOException {
    byte[] input = data.getBytes("UTF-8");
    Deflater df = new Deflater();
    df.setLevel(Deflater.BEST_COMPRESSION);
    df.setInput(input);

    ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
    df.finish();
    byte[] buff = new byte[1024];
    while (!df.finished()) {
        int count = df.deflate(buff);
        baos.write(buff, 0, count);
    }
    baos.close();
    byte[] output = baos.toByteArray();

    return output;
}

public static String uncompress(byte[] input) 
            throws UnsupportedEncodingException, IOException,
        DataFormatException {
    Inflater ifl = new Inflater();
    ifl.setInput(input);

    ByteArrayOutputStream baos = new ByteArrayOutputStream(input.length);
    byte[] buff = new byte[1024];
    while (!ifl.finished()) {
        int count = ifl.inflate(buff);
        baos.write(buff, 0, count);
    }
    baos.close();
    byte[] output = baos.toByteArray();

    return new String(output);
}

My Tests work as follows (works fine)

String text = "some text";
byte[] bytes = Compressor.compress(text);
assertEquals(Compressor.uncompress(bytes), text); // works

For no reason other then, why not, i'd like to modify the first method to return a String instead of the byte[].

So i return new String(output) from the compress method and modify my tests to:

String text = "some text";
String compressedText = Compressor.compress(text);
assertEquals(Compressor.uncompress(compressedText.getBytes), text); //fails

This test fails with java.util.zip.DataFormatException: incorrect header check

Why is that? What needs to be done to make it work?

解决方案

The String(byte[]) constructor is the problem. You cannot simply take arbitrary bytes, convert them to a string and then back to byte array. String class performs sophisticated encoding on this byte based on desired charset. If given byte sequence can't be represented e.g. in Unicode it will be discarded or converted to something else. The conversion from bytes to String and back to bytes is lossless only if these bytes really represented some String (in some encoding).

Here is a simplest example:

new String(new byte[]{-128}, "UTF-8").getBytes("UTF-8")

The above returns -17, -65, -67 while 127 input returns the exact same output.

这篇关于字符串,字节[]和压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆