存储在磁盘上的HashMap从磁盘读回非常慢 [英] HashMap stored on disk is very slow to read back from disk
问题描述
我有一个存储外部uid的HashMap,然后它为给定的uid存储了一个不同的id(内部为我们的应用程序)。
例如:
映射由uid检查以确保使用相同的内部标识。
DICOMUID2StudyIdentiferMap定义如下:
private static Map DICOMUID2StudyIdentiferMap = Collections.synchronizedMap(new HashMap());
如果我们成功加载,加载会覆盖它,否则它将使用默认的空HashMap。
通过执行从磁盘读回:
FileInputStream f =新的FileInputStream(studyUIDFile);
ObjectInputStream s = new ObjectInputStream(f);
Map loadedMap =(Map)s.readObject();
DICOMUID2StudyIdentiferMap = Collections.synchronizedMap(loadedMap);
使用以下命令将HashMap写入磁盘:
FileOutputStream f = new FileOutputStream(studyUIDFile);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(DICOMUID2StudyIdentiferMap);
我遇到的问题是,在Eclipse中本地运行性能很好,但是当应用程序运行时在机器上正常使用HashMap需要花费几分钟从磁盘加载。加载完成后,查看DICOMUID2StudyIdentiferMap.put(...,...)是否会返回值也需要很长时间才能检查以前的值。
我在两种情况下加载相同的地图对象,它的一个〜400kb文件。它包含的HashMap有大约3000个键值对。
为什么它在一台机器上很慢,但不在eclipse中?
该机器是一台运行XP的虚拟机,它最近才开始变慢读取HashMap,因此它必须与其大小相关,但400kb不是非常大我不认为。
任何建议欢迎,TIA
不确定序列化你的地图是最好的选择。如果Map是基于磁盘的持久性,为什么不使用专为磁盘设计的库?查看京都内阁。它实际上是用c ++编写的,但有一个java API。我已经多次使用过它,它使用起来非常简单,速度非常快,并且可以缩放到很大。
东京内阁,京都的旧版本,但它基本相同: ;....
字符串dir =/ path / to / my / dir /;
HDB哈希=新的HDB();
//打开哈希以进行读取/写入,如果磁盘上不存在则创建
if(!hash.open(dir +unigrams.tch,HDB.OWRITER | HDB。 OCREAT)){
抛出新的IOException(无法打开+ dir +unigrams.tch:+ hash.errmsg());
}
//向hash添加一些内容
hash.put(blah,my string);
//关闭它
hash.close();
I have a HashMap that stores external uids and then it stores a different id ( internal for our app ) that has been set for the given uid.
e.g:
- 123.345.432=00001
- 123.354.433=00002
The map is checked by uid to make sure the same internal id will be used. If something is resent to the application.
DICOMUID2StudyIdentiferMap defined as follows:
private static Map DICOMUID2StudyIdentiferMap = Collections.synchronizedMap(new HashMap());
The load however will overwrite it, if we successfully load, otherwise it will use the default empty HashMap.
Its read back from disk by doing:
FileInputStream f = new FileInputStream( studyUIDFile );
ObjectInputStream s = new ObjectInputStream( f );
Map loadedMap = ( Map )s.readObject();
DICOMUID2StudyIdentiferMap = Collections.synchronizedMap( loadedMap );
The HashMap is written to disk using:
FileOutputStream f = new FileOutputStream( studyUIDFile );
ObjectOutputStream s = new ObjectOutputStream( f );
s.writeObject(DICOMUID2StudyIdentiferMap);
The issue I have is, locally running in Eclipse performance is fine, but when the application is running in normal use on a machine the HashMap is taking several minutes to load from disk. Once loaded it also takes a long time to check for a previous value by say seeing if DICOMUID2StudyIdentiferMap.put(..., ...) will return a value.
I load the same map object in both cases, its a ~400kb file. The HashMap that it contains has about ~3000 key-value pairs.
Why is it so slow on one machine, but not in eclipse?
The machine is a VM running XP it has only recently started becoming slow to read the HashMap, so it must be related to the size of it, however 400kb isn't very big I don't think.
Any advice welcome, TIA
Not sure that serialising your Map is the best option. If the Map is disk-based for persistance, why not use a lib that's designed for disk? Check out Kyoto Cabinet. It's actually written in c++ but there is a java API. I've used it several times, it's very easy to use, very fast and can scale to a huge size.
This is an example I'm copy/pasting for Tokyo cabinet, the old version of Kyoto, but it's basically the same:
import tokyocabinet.HDB;
....
String dir = "/path/to/my/dir/";
HDB hash = new HDB();
// open the hash for read/write, create if does not exist on disk
if (!hash.open(dir + "unigrams.tch", HDB.OWRITER | HDB.OCREAT)) {
throw new IOException("Unable to open " + dir + "unigrams.tch: " + hash.errmsg());
}
// Add something to the hash
hash.put("blah", "my string");
// Close it
hash.close();
这篇关于存储在磁盘上的HashMap从磁盘读回非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!