HashMap格式不正确的二进制序列化< String,Double> [英] Malformed binary serialization of HashMap<String,Double>

查看:162
本文介绍了HashMap格式不正确的二进制序列化< String,Double>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一些代码来迭代条目并序列化它们中的每一个,而不是使用 ObjectOutputStream.readObject()来编写 HashMap< String,Double> 。原因仅仅在于效率:结果文件要小得多,而且写入和读取速度要快得多(例如0.6秒内为23 MB,9.9秒为29 MB)。

  ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(test.bin) ); 
oos.writeInt(map.size()); //写入地图
的大小(Map.Entry< String,Double> entry:map.entrySet()){//迭代条目
System.out.println(writing(+ entry .getKey()+,+ entry.getValue()+));
byte [] bytes = entry.getKey()。getBytes();
oos.writeInt(bytes.length); //键字符串的长度
oos.write(bytes); //键字符串字节
oos.writeDouble(entry.getValue()); //值
}
oos.close();

正如你所看到的,我得到了 byte 数组,为每个键 String ,序列化其长度,然后数组本身。这是我反序列化的方法:

  ObjectInputStream ois = new ObjectInputStream(new FileInputStream(test.bin)); 
int size = ois.readInt(); //读取地图的大小
HashMap< String,Double> newMap = new HashMap<>(size);
for(int i = 0; i< size; i ++){//迭代条目
int length = ois.readInt(); //键字符串的长度
byte [] bytes = new byte [length];
ois.read(bytes); //键字符串字节
String key = new String(bytes);
double value = ois.readDouble(); // value
newMap.put(key,value);
System.out.println(read(+ key +,+ value +));
}

问题是在某些时候,密钥没有正确序列化。我一直在调试,以至于我可以看到 ois.read(bytes)读取了8个字节,而不是16个字符,因此 String 的格式不正确,并且使用密钥的最后8个字节未读取 double 值。最后,无处不在。



使用下面的示例数据,输出结果在某些时候会是这样的:




read(2010-00-,1.4007397428546247E-76)
线程main中的异常java.lang.OutOfMemoryError:Java堆空间
at ti.Test.main(Test.java:82)

可以在序列化文件中看到问题(它应该读取 2010-00-008.html ):





两个字节添加在 String 键之间。有关详细信息,请参阅 MxyL的回答。所以这一切归结为:为什么添加这两个字节,为什么 readFully 可以正常工作?



为什么不正确地(de)序列化 String ?可能是某种填充固定块大小或类似的东西? 是否有更好的方法在寻找效率时手动序列化 String 我期待某种 writeString code>和 readString ,但在Java的 ObjectStream 中似乎没有这种东西。



我一直在尝试使用缓冲流,以防万一出现错误,明确指出要使用不同编码写入和读取多少字节,但没有运气。



这是一些重现问题的示例数据:

  HashMap< String,Double> ; map = new HashMap< String,Double>(); 
map.put(2010-00-027.html,21732.994621513037); map.put( 2010-00-020.html,3466.5169348296736); map.put( 2010-00-051.html,12528.648992702407); map.put( 2010-00-062.html,3354.8950010256385);
map.put(2010-00-024.html,10295.095511718278); map.put( 2010-00-052.html,5381.513344679818); map.put( 2010-00-007.html,16466.33813960735); map.put( 2010-00-017.html,9484.969198176652);
map.put(2010-00-054.html,15423.873112634772); map.put( 2010-00-022.html,8123.842752870753); map.put( 2010-00-033.html,21238.496665104063); map.put( 2010-00-028.html,7578.792651786424);
map.put(2010-00-048.html,3566.4118233046393); map.put( 2010-00-040.html,2681.0799941861724); map.put( 2010-00-049.html,14308.090890746222); map.put( 2010-00-058.html,5911.342406606804);
map.put(2010-00-045.html,2284.118716145881); map.put( 2010-00-031.html,2859.565771680721); map.put( 2010-00-046.html,4555.187022907964); map.put( 2010-00-036.html,8479.709295569426);
map.put(2010-00-061.html,846.8292195815125); map.put( 2010-00-023.html,14108.644025417952); map.put( 2010-00-041.html,22686.232732684934); map.put( 2010-00-025.html,9513.539663409734);
map.put(2010-00-012.html,459.6427911376829); map.put( 2010-00-005.html,0.0); map.put( 2010-00-013.html,2646.403220496738); map.put( 2010-00-065.html,5808.86423609936);
map.put(2010-00-056.html,12154.250518054876); map.put( 2010-00-008.html,10811.15198506469); map.put( 2010-00-042.html,9271.006516004005); map.put( 2010-00-000.html,4387.4162586468965);
map.put(2010-00-059.html,4456.211623469774); map.put( 2010-00-055.html,3534.7511584735325); map.put( 2010-00-057.html,8745.640098512009); map.put( 2010-00-032.html,4993.295735075575);
map.put(2010-00-021.html,3852.5805998017922); map.put( 2010-00-043.html,4108.020033536286); map.put( 2010-00-053.html 2.2446400279239946); map.put( 2010-00-030.html,17853.541210836203);


解决方案

ObjectOutputStream首先写入STREAM_MAGIC(0xaced)然后写入STREAM_VERSION 5)然后写入TC_BLOCKDATALONG(0x7A),然后写入块大小(1024),如果长度小于255,则写入最后一个块,它保留TC_BLOCKDATA(0x77)和块大小(最后一块的长度)
因此当ObjectOutputStream使用readFully时,它首先读取数据到一个跳过STREAM_MAGIC,STREAM_VERSION的缓冲区,然后对于每个块,读取块大小以获得大小,然后读取所有大小的数据到缓冲区。 b $ b

I wrote some code to serialize a HashMap<String,Double> by iterating entries and serializing each of them instead of using ObjectOutputStream.readObject(). The reason is just efficiency: the resulting file is much smaller and it is much faster to write and read (eg. 23 MB in 0.6 seconds vs. 29 MB in 9.9 seconds).

This is what I did to serialize:

ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("test.bin"));
oos.writeInt(map.size()); // write size of the map
for (Map.Entry<String, Double> entry : map.entrySet()) { // iterate entries
    System.out.println("writing ("+ entry.getKey() +","+ entry.getValue() +")");
    byte[] bytes = entry.getKey().getBytes();
    oos.writeInt(bytes.length); // length of key string
    oos.write(bytes); // key string bytes
    oos.writeDouble(entry.getValue()); // value
}
oos.close();

As you can see, I get the byte array for each key String, serialize its length and then the array itself. This is what I did to deserialize:

ObjectInputStream ois = new ObjectInputStream(new FileInputStream("test.bin"));
int size = ois.readInt(); // read size of the map
HashMap<String, Double> newMap = new HashMap<>(size);
for (int i = 0; i < size; i++) { // iterate entries
    int length = ois.readInt(); // length of key string
    byte[] bytes = new byte[length];
    ois.read(bytes); // key string bytes
    String key = new String(bytes);
    double value = ois.readDouble(); // value
    newMap.put(key, value);
    System.out.println("read ("+ key +","+ value +")");
}

The problem is that at some point the key is not serialized correctly. I've been debugging to the point where I could see that ois.read(bytes) read 8 bytes instead of 16 as it was supposed to, so the key String was not properly formed and the double value was read using the last 8 bytes from the key that were not read yet. In the end, Exceptions everywhere.

Using the sample data below, the output will be like this at some point:

read (2010-00-056.html,12154.250518054876)
read (2010-00-        ,1.4007397428546247E-76)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at ti.Test.main(Test.java:82)

The problem can be seen in the serialized file (it should read 2010-00-008.html):

two bytes are added in between the String key. See MxyL's answer for further info about this. So it all boils down to: why are those two bytes added, and why readFully works ok?

Why isn't the String properly (de)serialized? Might it be some kind of padding to a fixed block size or something like that? Is there a better way to manually serialize a String when looking for efficiency? I was expecting some kind of writeString and readString, but seems there is no such thing in Java's ObjectStream.

I've been trying using buffered streams just in case there is something wrong there, explicitly saying how many bytes to write and to read, using different encodings, but no luck.

This is some sample data to reproduce the problem:

HashMap<String, Double> map = new HashMap<String, Double>();
map.put("2010-00-027.html",21732.994621513037); map.put("2010-00-020.html",3466.5169348296736); map.put("2010-00-051.html",12528.648992702407); map.put("2010-00-062.html",3354.8950010256385);
map.put("2010-00-024.html",10295.095511718278); map.put("2010-00-052.html",5381.513344679818);  map.put("2010-00-007.html",16466.33813960735);  map.put("2010-00-017.html",9484.969198176652);
map.put("2010-00-054.html",15423.873112634772); map.put("2010-00-022.html",8123.842752870753);  map.put("2010-00-033.html",21238.496665104063); map.put("2010-00-028.html",7578.792651786424);
map.put("2010-00-048.html",3566.4118233046393); map.put("2010-00-040.html",2681.0799941861724); map.put("2010-00-049.html",14308.090890746222); map.put("2010-00-058.html",5911.342406606804);
map.put("2010-00-045.html",2284.118716145881);  map.put("2010-00-031.html",2859.565771680721);  map.put("2010-00-046.html",4555.187022907964);  map.put("2010-00-036.html",8479.709295569426);
map.put("2010-00-061.html",846.8292195815125);  map.put("2010-00-023.html",14108.644025417952); map.put("2010-00-041.html",22686.232732684934); map.put("2010-00-025.html",9513.539663409734);
map.put("2010-00-012.html",459.6427911376829);  map.put("2010-00-005.html",0.0);    map.put("2010-00-013.html",2646.403220496738);  map.put("2010-00-065.html",5808.86423609936);
map.put("2010-00-056.html",12154.250518054876); map.put("2010-00-008.html",10811.15198506469);  map.put("2010-00-042.html",9271.006516004005);  map.put("2010-00-000.html",4387.4162586468965);
map.put("2010-00-059.html",4456.211623469774);  map.put("2010-00-055.html",3534.7511584735325); map.put("2010-00-057.html",8745.640098512009);  map.put("2010-00-032.html",4993.295735075575);
map.put("2010-00-021.html",3852.5805998017922); map.put("2010-00-043.html",4108.020033536286);  map.put("2010-00-053.html",2.2446400279239946); map.put("2010-00-030.html",17853.541210836203);

解决方案

ObjectOutputStream first write STREAM_MAGIC(0xaced) then write STREAM_VERSION(5) then write TC_BLOCKDATALONG (0x7A) then block size (1024) and for the last block, if length is less then 255, it wirte TC_BLOCKDATA (0x77) and block size (length of last block)

so when ObjectOutputStream using readFully, it first read data to a buffer which skip STREAM_MAGIC, STREAM_VERSION, then for every block, read block size to get the size then read all size data to buffer

这篇关于HashMap格式不正确的二进制序列化&lt; String,Double&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆