高性能序列化:Java与Google Protocol Buffers vs ...? [英] High performance serialization: Java vs Google Protocol Buffers vs ...?

查看:105
本文介绍了高性能序列化:Java与Google Protocol Buffers vs ...?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一些缓存,我正在考虑为即将到来的项目做的事情,我一直在考虑Java序列化。即,是否应该使用它?

For some caching I'm thinking of doing for an upcoming project, I've been thinking about Java serialization. Namely, should it be used?

现在我已经在过去几年中出于各种原因编写了自定义序列化和反序列化(Externalizable)。目前互操作性已成为一个问题,我可以预见需要与.Net应用程序进行交互,所以我想到了使用独立于平台的解决方案。

Now I've previously written custom serialization and deserialization (Externalizable) for various reasons in years past. These days interoperability has become even more of an issue and I can foresee a need to interact with .Net applications so I've thought of using a platform-independant solution.

有没有人有过使用GPB的高性能经验?它在速度和效率方面与Java的本机序列化相比如何?或者,还有其他值得考虑的方案吗?

Has anyone had any experience with high-performance use of GPB? How does it compare in terms of speed and efficiency with Java's native serialization? Alternatively, are there any other schemes worth considering?

推荐答案

我没有在速度方面将Protocol Buffers与Java的本机序列化进行比较,但对于互操作性,Java的本机序列化是一个严重的禁忌。在大多数情况下,它在空间方面也不如协议缓冲区那么高效。当然,它在存储方面以及参考方面等方面都更加灵活。协议缓冲区非常适合它的用途,当它满足您的需求时它非常好 - 但是由于互操作性存在明显的限制(和其他东西)。

I haven't compared Protocol Buffers with Java's native serialization in terms of speed, but for interoperability Java's native serialization is a serious no-no. It's also not going to be as efficient in terms of space as Protocol Buffers in most cases. Of course, it's somewhat more flexible in terms of what it can store, and in terms of references etc. Protocol Buffers is very good at what it's intended for, and when it fits your need it's great - but there are obvious restrictions due to interoperability (and other things).

我最近在Java和.NET上发布了一个Protocol Buffers基准测试框架。 Java版本位于主要Google项目中(在基准测试目录),.NET版本位于我的C#端口项目。如果要将PB速度与Java序列化速度进行比较,可以编写类似的类并对其进行基准测试。如果您对interop感兴趣,我真的不会再考虑本机Java序列化(或.NET本机二进制序列化)。

I've recently posted a Protocol Buffers benchmarking framework in Java and .NET. The Java version is in the main Google project (in the benchmarks directory), the .NET version is in my C# port project. If you want to compare PB speed with Java serialization speed you could write similar classes and benchmark them. If you're interested in interop though, I really wouldn't give native Java serialization (or .NET native binary serialization) a second thought.

还有其他选项可供选择除协议缓冲区之外的可互操作序列化 - Thrift JSON YAML ,毫无疑问。

There are other options for interoperable serialization besides Protocol Buffers though - Thrift, JSON and YAML spring to mind, and there are doubtless others.

编辑:好的,因为互操作不是那么重要,所以值得尝试从序列化框架中列出你想要的不同品质。您应该考虑的一件事是版本控制 - 这是另一件事,PB旨在处理好,无论是向后还是向前(所以新软件可以读取旧数据,反之亦然) - 当你坚持建议的规则,当然:)

Okay, with interop not being so important, it's worth trying to list the different qualities you want out of a serialization framework. One thing you should think about is versioning - this is another thing that PB is designed to handle well, both backwards and forwards (so new software can read old data and vice versa) - when you stick to the suggested rules, of course :)

在尝试对Java性能与本机序列化保持谨慎之后,我真的不会惊讶地发现PB无论如何都更快。如果有机会,请使用服务器虚拟机 - 我最近的基准测试显示,在序列化和反序列化样本数据时,服务器虚拟机的速度超过了两倍。我认为PB代码非常适合服务器VM的JIT:)

Having tried to be cautious about the Java performance vs native serialization, I really wouldn't be surprised to find that PB was faster anyway. If you have the chance, use the server vm - my recent benchmarks showed the server VM to be over twice as fast at serializing and deserializing the sample data. I think the PB code suits the server VM's JIT very nicely :)

正如样本性能数据,序列化和反序列化两条消息(一个228字节,一个84750字节)我使用服务器VM在我的笔记本电脑上获得了这些结果:

Just as sample performance figures, serializing and deserializing two messages (one 228 bytes, one 84750 bytes) I got these results on my laptop using the server VM:


Benchmarking benchmarks.GoogleSize$SizeMessage1 with file google_message1.dat 
Serialize to byte string: 2581851 iterations in 30.16s; 18.613789MB/s 
Serialize to byte array: 2583547 iterations in 29.842s; 18.824497MB/s 
Serialize to memory stream: 2210320 iterations in 30.125s; 15.953759MB/s 
Deserialize from byte string: 3356517 iterations in 30.088s; 24.256632MB/s 
Deserialize from byte array: 3356517 iterations in 29.958s; 24.361889MB/s 
Deserialize from memory stream: 2618821 iterations in 29.821s; 19.094952MB/s 

Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat 
Serialize to byte string: 17068518 iterations in 29.978s; 123.802124MB/s 
Serialize to byte array: 17520066 iterations in 30.043s; 126.802376MB/s 
Serialize to memory stream: 7736665 iterations in 30.076s; 55.93307MB/s 
Deserialize from byte string: 16123669 iterations in 30.073s; 116.57947MB/s 
Deserialize from byte array: 16082453 iterations in 30.109s; 116.14243MB/s
Deserialize from memory stream: 7496968 iterations in 30.03s; 54.283176MB/s 

Benchmarking benchmarks.GoogleSize$SizeMessage2 with file google_message2.dat 
Serialize to byte string: 6266 iterations in 30.034s; 16.826494MB/s 
Serialize to byte array: 6246 iterations in 30.027s; 16.776697MB/s 
Serialize to memory stream: 6042 iterations in 29.916s; 16.288969MB/s 
Deserialize from byte string: 4675 iterations in 29.819s; 12.644595MB/s 
Deserialize from byte array: 4694 iterations in 30.093s; 12.580387MB/s 
Deserialize from memory stream: 4544 iterations in 29.579s; 12.389998MB/s 

Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat 
Serialize to byte string: 39562 iterations in 30.055s; 106.16416MB/s 
Serialize to byte array: 39715 iterations in 30.178s; 106.14035MB/s 
Serialize to memory stream: 34161 iterations in 30.032s; 91.74085MB/s 
Deserialize from byte string: 36934 iterations in 29.794s; 99.98019MB/s 
Deserialize from byte array: 37191 iterations in 29.915s; 100.26867MB/s 
Deserialize from memory stream: 36237 iterations in 29.846s; 97.92251MB/s 

速度与大小是指生成的代码是针对速度还是代码大小进行了优化。 (两种情况下的序列化数据都是相同的。大小版本是为您定义了大量消息并且不想为代码占用大量内存的情况提供的。)

The "speed" vs "size" is whether the generated code is optimised for speed or code size. (The serialized data is the same in both cases. The "size" version is provided for the case where you've got a lot of messages defined and don't want to take a lot of memory for the code.)

正如您所看到的,对于较小的消息,它可以非常快速 - 超过500个小消息序列化或反序列化每毫秒 。即使使用87K消息,每条消息的消息也不到一毫秒。

As you can see, for the smaller message it can be very fast - over 500 small messages serialized or deserialized per millisecond. Even with the 87K message it's taking less than a millisecond per message.

这篇关于高性能序列化:Java与Google Protocol Buffers vs ...?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆