高性能序列化:Java vs Google Protocol Buffers vs ...? [英] High performance serialization: Java vs Google Protocol Buffers vs ...?

查看:25
本文介绍了高性能序列化:Java vs Google Protocol Buffers vs ...?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我正在考虑为即将到来的项目做的一些缓存,我一直在考虑 Java 序列化.即,应该使用它吗?

现在我之前已经写过自定义序列化和反序列化(Externalizable),因为过去几年出于各种原因.如今,互操作性已成为一个更大的问题,我可以预见需要与 .Net 应用程序进行交互,因此我考虑使用独立于平台的解决方案.

有没有人有高性能使用GPB的经验?它在速度和效率方面与 Java 的原生序列化相比如何?或者,还有其他值得考虑的方案吗?

解决方案

我没有在速度方面将 Protocol Buffers 与 Java 的本机序列化进行比较,但对于互操作性而言,Java 的本机序列化是一个严重的禁忌.在大多数情况下,它在空间方面也不会像 Protocol Buffers 那样高效.当然,它在可以存储的内容和引用等方面更加灵活.Protocol Buffers 非常适合它的用途,并且当它满足您的需要时它很棒 - 但由于互操作性存在明显的限制(和其他东西).

我最近在 Java 和 .NET 中发布了一个 Protocol Buffers 基准测试框架.Java 版本位于 主要 Google 项目(在 基准目录),.NET 版本在 我的 C# 移植项目.如果您想将 PB 速度与 Java 序列化速度进行比较,您可以编写类似的类并对它们进行基准测试.不过,如果您对互操作感兴趣,我真的不会再考虑本机 Java 序列化(或 .NET 本机二进制序列化).

除了协议缓冲区之外,还有其他可互操作的序列化选项 - ThriftJSONYAML浮现在脑海中,毫无疑问还有其他人.

好的,互操作不是那么重要,值得尝试列出您想要从序列化框架中获得的不同品质.您应该考虑的一件事是版本控制 - 这是 PB 旨在很好地处理的另一件事,无论是向后还是向前(因此新软件可以读取旧数据,反之亦然)-当然,当您坚持建议的规则时:)

在尝试对 Java 性能与本机序列化保持谨慎之后,发现 PB 无论如何都更快,我真的不会感到惊讶.如果有机会,请使用服务器虚拟机——我最近的基准测试表明,服务器虚拟机在序列化和反序列化示例数据时快两倍.我认为 PB 代码非常适合服务器 VM 的 JIT :)

就像示例性能数据,序列化和反序列化两条消息(一条 228 字节,一条 84750 字节)我使用服务器 VM 在我的笔记本电脑上得到了这些结果:

<前>使用文件 google_message1.dat 对 benchmarks.GoogleSize$SizeMessage1 进行基准测试序列化为字节串:30.16s内迭代2581851次;18.613789MB/秒序列化为字节数组:29.842s 内迭代 2583547 次;18.824497MB/秒序列化到内存流:30.125s内迭代2210320次;15.953759MB/秒从字节串反序列化:30.088 秒内迭代 3356517 次;24.256632MB/秒从字节数组反序列化:29.958 秒内迭代 3356517 次;24.361889MB/秒从内存流中反序列化:29.821 秒内迭代 2618821 次;19.094952MB/秒使用文件 google_message1.dat 对 benchmarks.GoogleSpeed$SpeedMessage1 进行基准测试序列化为字节串:29.978s 内迭代 17068518 次;123.802124MB/秒序列化为字节数组:30.043s 内迭代 17520066 次;126.802376MB/秒序列化到内存流:30.076s 内迭代 7736665 次;55.93307MB/秒从字节串反序列化:30.073 秒内迭代 16123669 次;116.57947MB/秒从字节数组反序列化:30.109 秒内迭代 16082453 次;116.14243MB/秒从内存流中反序列化:30.03 秒内迭代 7496968 次;54.283176MB/秒使用文件 google_message2.dat 对 benchmarks.GoogleSize$SizeMessage2 进行基准测试序列化为字节串:30.034s 内迭代 6266 次;16.826494MB/秒序列化为字节数组:30.027s 内迭代 6246 次;16.776697MB/秒序列化到内存流:29.916s内6042次迭代;16.288969MB/秒从字节串反序列化:29.819 秒内迭代 4675 次;12.644595MB/秒从字节数组反序列化:30.093 秒内迭代 4694 次;12.580387MB/秒从内存流中反序列化:29.579 秒内进行 4544 次迭代;12.389998MB/秒使用文件 google_message2.dat 对 benchmarks.GoogleSpeed$SpeedMessage2 进行基准测试序列化为字节串:30.055s 内迭代 39562 次;106.16416MB/秒序列化为字节数组:30.178s 内迭代 39715 次;106.14035MB/秒序列化到内存流:30.032s内迭代34161次;91.74085MB/秒从字节串反序列化:29.794 秒内迭代 36934 次;99.98019MB/秒从字节数组反序列化:29.915 秒内迭代 37191 次;100.26867MB/秒从内存流中反序列化:29.846 秒内迭代 36237 次;97.92251MB/秒

速度"与大小"是指生成的代码是否针对速度或代码大小进行了优化.(序列化数据在两种情况下是相同的.大小"版本是为您定义了大量消息并且不想为代码占用大量内存的情况提供的.)

如您所见,对于较小的消息,它可以非常快速 - 每毫秒序列化或反序列化 500 多个小消息.即使是 87K 的消息,每条消息的时间也不到一毫秒.

For some caching I'm thinking of doing for an upcoming project, I've been thinking about Java serialization. Namely, should it be used?

Now I've previously written custom serialization and deserialization (Externalizable) for various reasons in years past. These days interoperability has become even more of an issue and I can foresee a need to interact with .Net applications so I've thought of using a platform-independant solution.

Has anyone had any experience with high-performance use of GPB? How does it compare in terms of speed and efficiency with Java's native serialization? Alternatively, are there any other schemes worth considering?

解决方案

I haven't compared Protocol Buffers with Java's native serialization in terms of speed, but for interoperability Java's native serialization is a serious no-no. It's also not going to be as efficient in terms of space as Protocol Buffers in most cases. Of course, it's somewhat more flexible in terms of what it can store, and in terms of references etc. Protocol Buffers is very good at what it's intended for, and when it fits your need it's great - but there are obvious restrictions due to interoperability (and other things).

I've recently posted a Protocol Buffers benchmarking framework in Java and .NET. The Java version is in the main Google project (in the benchmarks directory), the .NET version is in my C# port project. If you want to compare PB speed with Java serialization speed you could write similar classes and benchmark them. If you're interested in interop though, I really wouldn't give native Java serialization (or .NET native binary serialization) a second thought.

There are other options for interoperable serialization besides Protocol Buffers though - Thrift, JSON and YAML spring to mind, and there are doubtless others.

EDIT: Okay, with interop not being so important, it's worth trying to list the different qualities you want out of a serialization framework. One thing you should think about is versioning - this is another thing that PB is designed to handle well, both backwards and forwards (so new software can read old data and vice versa) - when you stick to the suggested rules, of course :)

Having tried to be cautious about the Java performance vs native serialization, I really wouldn't be surprised to find that PB was faster anyway. If you have the chance, use the server vm - my recent benchmarks showed the server VM to be over twice as fast at serializing and deserializing the sample data. I think the PB code suits the server VM's JIT very nicely :)

Just as sample performance figures, serializing and deserializing two messages (one 228 bytes, one 84750 bytes) I got these results on my laptop using the server VM:

Benchmarking benchmarks.GoogleSize$SizeMessage1 with file google_message1.dat 
Serialize to byte string: 2581851 iterations in 30.16s; 18.613789MB/s 
Serialize to byte array: 2583547 iterations in 29.842s; 18.824497MB/s 
Serialize to memory stream: 2210320 iterations in 30.125s; 15.953759MB/s 
Deserialize from byte string: 3356517 iterations in 30.088s; 24.256632MB/s 
Deserialize from byte array: 3356517 iterations in 29.958s; 24.361889MB/s 
Deserialize from memory stream: 2618821 iterations in 29.821s; 19.094952MB/s 

Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat 
Serialize to byte string: 17068518 iterations in 29.978s; 123.802124MB/s 
Serialize to byte array: 17520066 iterations in 30.043s; 126.802376MB/s 
Serialize to memory stream: 7736665 iterations in 30.076s; 55.93307MB/s 
Deserialize from byte string: 16123669 iterations in 30.073s; 116.57947MB/s 
Deserialize from byte array: 16082453 iterations in 30.109s; 116.14243MB/s
Deserialize from memory stream: 7496968 iterations in 30.03s; 54.283176MB/s 

Benchmarking benchmarks.GoogleSize$SizeMessage2 with file google_message2.dat 
Serialize to byte string: 6266 iterations in 30.034s; 16.826494MB/s 
Serialize to byte array: 6246 iterations in 30.027s; 16.776697MB/s 
Serialize to memory stream: 6042 iterations in 29.916s; 16.288969MB/s 
Deserialize from byte string: 4675 iterations in 29.819s; 12.644595MB/s 
Deserialize from byte array: 4694 iterations in 30.093s; 12.580387MB/s 
Deserialize from memory stream: 4544 iterations in 29.579s; 12.389998MB/s 

Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat 
Serialize to byte string: 39562 iterations in 30.055s; 106.16416MB/s 
Serialize to byte array: 39715 iterations in 30.178s; 106.14035MB/s 
Serialize to memory stream: 34161 iterations in 30.032s; 91.74085MB/s 
Deserialize from byte string: 36934 iterations in 29.794s; 99.98019MB/s 
Deserialize from byte array: 37191 iterations in 29.915s; 100.26867MB/s 
Deserialize from memory stream: 36237 iterations in 29.846s; 97.92251MB/s 

The "speed" vs "size" is whether the generated code is optimised for speed or code size. (The serialized data is the same in both cases. The "size" version is provided for the case where you've got a lot of messages defined and don't want to take a lot of memory for the code.)

As you can see, for the smaller message it can be very fast - over 500 small messages serialized or deserialized per millisecond. Even with the 87K message it's taking less than a millisecond per message.

这篇关于高性能序列化:Java vs Google Protocol Buffers vs ...?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆