在Java中序列化和压缩对象的性能成本 [英] Performance cost of serialization and compress a Object in Java

查看:153
本文介绍了在Java中序列化和压缩对象的性能成本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

应用程序一直接收名为 Report 的对象,并将对象放入 Disruptor >

在Eclipse Memory Analysis的帮助下,每个 Report 对象的保留堆大小平均为20KB。应用程序从 -Xmx2048 开始,表示应用程序的堆大小为2GB。



对象大约是100,000,这意味着所有对象的总大小约为2GB。



要求是所有100,000个对象应该加载到 Disruptor ,以便消费者异步使用数据。



所以我想将对象序列化到 String

code>并压缩它:

  private static byte [] toBytes(Serializable o)throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(o);
oos.close();

return baos.toByteArray();
}

私有静态字符串压缩(byte [] str)throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str);
gzip.close();
return new String(Base64Coder.encode(out.toByteArray()));
}


code>,对象大小较小:



压缩之前





压缩后





对象的字符串大约6KB。现在更好了。



这是我的问题:


  1. 调用序列化和压缩每次都将创建如 ByteArrayOutputStream 的对象。

    $ c>, ObjectOutputStream 等。我不想创建许多对象像 ByteArrayOutputStream ObjectOutputStream ,因为我需要重复100,000次。如何设计代码,使像 ByteArrayOutputStream ObjectOutputStream 的对象只创建一次并在每次迭代中使用它?


  2. 消费者需要反序列化并解压缩 Disruptor 中的字符串。如果我有三个消费者,所以我需要反序列化和解压缩三次。任何方式?







更新:



由于@BoristheSpider建议,序列化和压缩应该在一个操作中执行:

  private static byte [] compressObj(Serializable o)throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
GZIPOutputStream zos = new GZIPOutputStream(bos);
ObjectOutputStream ous = new ObjectOutputStream(zos);

ous.writeObject(o);
zos.finish();
bos.flush();

return bos.toByteArray();
}


解决方案

使用ObjectOutputStream比使用Disruptor更昂贵,它打败了使用它的目的。它可能是1000倍更贵。



你最好限制一次排队的对象数量。除非你的设计出现严重错误,否则只有1000个20 KB的对象队列应该足以确保所有的客户都能有效地工作。



BTW if你需要持久性,我会使用Chronicle(部分是因为我写它)这不需要压缩或byte []或字符串的存储,持久的所有消息,你的队列是无界的,完全关闭堆。即您的100K对象将使用<< 1 MB的堆。


The application keeps receiving objects named Report and put the objects into Disruptor for three different consumers.

With the help of Eclipse Memory Analysis, the Retained Heap Size of each Report object is 20KB on average. The application starts with -Xmx2048, indicating the heap size of the application is 2GB.

However, the number of the objects is around 100,000 at a time, which means that the total size of all the object is roughly 2GB.

The requirement is that all 100,000 objects should be loaded into Disruptor so that the consumers would consume the data asynchronously. But it's not possible if the size of each object is as large as 20KB.

So I'd like to serialize the object to String and compress it:

private static byte[] toBytes(Serializable o) throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream oos = new ObjectOutputStream(baos);
    oos.writeObject(o);
    oos.close();

    return baos.toByteArray();
}

private static String compress(byte[] str) throws IOException {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(out);
    gzip.write(str);
    gzip.close();
    return new String(Base64Coder.encode(out.toByteArray()));
}

After compress(toBytes(Report)), the object size is smaller:

Before compression

After compression

Right now the String of object is around 6KB. It's better now.

Here's my question:

  1. Is there any other data format whose size is less than String?

  2. Calling serialization and compression each time will create objects like ByteArrayOutputStream, ObjectOutputStream and so on. I don't want to create to many objects like ByteArrayOutputStream, ObjectOutputStream because I need to iterate 100,000 times.How to design the codes so that objects like ByteArrayOutputStream, ObjectOutputStream only create once and use it for each iteration?

  3. Consumers need to deserialize and decompress the String from Disruptor. If I have three consumers so I need to deserialize and decompress three times. Any way around?


Update:

As @BoristheSpider suggested, the serialization and compression should be perform in one action:

private static byte[] compressObj(Serializable o) throws IOException {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    GZIPOutputStream zos = new GZIPOutputStream(bos);
    ObjectOutputStream ous = new ObjectOutputStream(zos);

    ous.writeObject(o);
    zos.finish();
    bos.flush();

    return bos.toByteArray();
}

解决方案

Using ObjectOutputStream and compression is so much more expensive than using Disruptor it defeats the purpose of using it. It is likely to be 1000x more expensive.

You are far better off limiting how many objects you queue at once. Unless you have something seriously wrong with your design, having a queue of just 1000 20 KB objects should be more than enough to ensure all you consumers are working efficiently.

BTW if you need persistence, I would use Chronicle (partly because I wrote it) This doesn't need compression or byte[] or Strings for storage, persists all messages, your queue is unbounded and entirely off heap. i.e. your 100K objects will use << 1 MB of heap.

这篇关于在Java中序列化和压缩对象的性能成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆