为什么Joda Time序列化表格这么大,该怎么办? [英] Why is Joda Time serialized form so large, and what to do about it?
问题描述
在我的机器上,以下代码段:
On my machine, the following code snippet:
DateTime now = DateTime.now();
System.out.println(now);
System.out.println("Date size:\t\t"+serialiseToArray(now).length);
System.out.println("DateString size:\t"+serialiseToArray(now.toString()).length);
System.out.println("java.util.Date size:\t"+serialiseToArray(new Date()).length);
Duration twoHours = Duration.standardHours(2);
System.out.println(twoHours);
System.out.println("Duration size:\t\t"+serialiseToArray(twoHours).length);
System.out.println("DurationString size:\t"+serialiseToArray(twoHours.toString()).length);
给出以下输出:
2013-09-09T15:07:44.642+01:00
Date size: 273
DateString size: 36
java.util.Date size: 46
PT7200S
Duration size: 107
DurationString size: 14
如您所见,org.joda.time.DateTime对象比其String形式大5倍以上,这似乎可以完美地描述它,并且等效于java.util.Date.表示2小时的Duration对象也比我预期的要大得多,因为从源头看,似乎其唯一的成员变量是单个long
值.
As you can see, the org.joda.time.DateTime object is more than 5 times larger than its String form, which seems to describe it perfectly, and the java.util.Date equivalent. The Duration object representing 2 hours is also much larger than I would expect, as looking at the source it seems like its only member variable is a single long
value.
为什么这些序列化的对象这么大?并有任何预先存在的解决方案来获得较小的表示形式吗?
Why are these serialized objects so large? And is there any pre-existing solution for getting a smaller representation?
serialiseToArray方法,以供参考:
The serialiseToArray method, for reference:
private static byte[] serialiseToArray(Serializable s)
{
try
{
ByteArrayOutputStream byteArrayBuffer = new ByteArrayOutputStream();
new ObjectOutputStream(byteArrayBuffer).writeObject(s);
return byteArrayBuffer.toByteArray();
}
catch (IOException ex)
{
throw new RuntimeException(ex);
}
}
推荐答案
序列化会有一些开销.在这种情况下,您最容易注意到的开销是在实际输出中描述了类结构.而且由于Duration
具有基类(BaseDuration
)和两个接口(ReadableDuration
和Serializable
),因此该开销比Date
的开销(没有基类,只有一个接口)稍微大一些. ).
Serializing has some overhead. In this instance the overhead that you notice the most is that the class structure is described in the actual output. And since Duration
has a base class (BaseDuration
) and two interfaces (ReadableDuration
and Serializable
), that overhead becomes slightly larger than the one of Date
(which has no base class and just a single interface).
这些类在序列化文件中使用其完全限定的类名称进行引用,因此会创建一些字节.
Those classes are referenced using their fully-qualified class names in the serialized file and as such create quite some bytes.
好消息:每个输出流仅支付一次开销.如果序列化另一个Duration
对象,则大小差异应该很小.
Good news: that overhead is only paid once per output stream. If you serialize another Duration
object, the difference in size should be rather small.
我已经使用 jdeserialize项目来查看序列化Duration
(请注意,此工具不需要访问.class
文件,因此它转储的所有信息实际上都包含在序列化数据中):
I've used the jdeserialize project to look in the result of serializing a java.util.Date
vs. a Duration
(note that this tool does not need access to the .class
files, so all information it dumps is actually contained in the serialized data):
java.util.Date
的结果:
read: java.util.Date _h0x7e0001 = r_0x7e0000;
//// BEGIN stream content output
java.util.Date _h0x7e0001 = r_0x7e0000;
//// END stream content output
//// BEGIN class declarations (excluding array classes)
class java.util.Date implements java.io.Serializable {
}
//// END class declarations
//// BEGIN instance dump
[instance 0x7e0001: 0x7e0000/java.util.Date
object annotations:
java.util.Date
[blockdata 0x00: 8 bytes]
field data:
0x7e0000/java.util.Date:
]
//// END instance dump
Duration
的结果:
read: org.joda.time.Duration _h0x7e0002 = r_0x7e0000;
//// BEGIN stream content output
org.joda.time.Duration _h0x7e0002 = r_0x7e0000;
//// END stream content output
//// BEGIN class declarations (excluding array classes)
class org.joda.time.Duration extends org.joda.time.base.BaseDuration implements java.io.Serializable {
}
class org.joda.time.base.BaseDuration implements java.io.Serializable {
long iMillis;
}
//// END class declarations
//// BEGIN instance dump
[instance 0x7e0002: 0x7e0000/org.joda.time.Duration
field data:
0x7e0001/org.joda.time.base.BaseDuration:
iMillis: 0
0x7e0000/org.joda.time.Duration:
]
//// END instance dump
请注意,类声明" Duration
的块要长得多.这也解释了为什么序列化单个 Duration
需要107个字节,而序列化两个(不同)Duration
对象仅需要121个字节.
Note that the "class declaration" block is quite a bit longer for Duration
. This also explains why serializing a single Duration
takes 107 bytes, but serializing two (distinct) Duration
objects takes only 121 bytes.
这篇关于为什么Joda Time序列化表格这么大,该怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!