Java中的HashMap,1亿条记录 [英] HashMap in Java, 100 Million entries

查看:150
本文介绍了Java中的HashMap,1亿条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将1亿条款及其频率(在文本数据库中)存储到 HashMap< String,Double> 中。它给我内存不足错误。我试图将堆空间增加到 -Xmx15000M 。然而它运行半个小时,然后再次抛出相同的异常。我试图读取文字和频率的文件大小是1.7GB。



任何帮助我们将不胜感激。



感谢:-)
<对于像这样的文字处理来说,如果你能忍受更长的查找时间,答案通常是一棵树而不是散列图。对于自然语言来说,这种结构的记忆效率很高,很多单词都有共同的起始字符串。



根据输入,Patricia树可能会更好。



(此外,如果这实际上是来自自然语言的词汇,您确定您确实需要100,000,000个词条吗?大多数常用词汇的出奇惊人的低,商业解决方案(词汇预测,拼写更正)很少使用超过100,000个单词而不考虑语言。)


I want to store 100 Million terms and their frequencies (in a text database ) into a HashMap <String, Double>. It is giving me "Out of Memory" Error. I tried to increase the heap-space to -Xmx15000M. However it runs half an hour then again throw the same exception. The file size from which I'm trying to read the words and frequencies is 1.7GB.

Any help would be much appreciated.

Thanks :-)

解决方案

For word processing like that the answer is usually a tree rather than hashmap, if you can live with the longer lookup times. That structure is quite memory efficient for natural languages, where many words have common start strings.

Depending on the input, a Patricia tree might be even better.

(Also, if this is indeed words from a natural language, are you sure you really need 100,000,000 entries? The majority of commonly used words is surprisingly low, commercial solutions (word prediction, spelling correction) rarely use more than 100,000 words regardless of language.)

这篇关于Java中的HashMap,1亿条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆