Java中的HashMap,1亿个条目 [英] HashMap in Java, 100 Million entries

查看:25
本文介绍了Java中的HashMap,1亿个条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 1 亿个术语及其频率(在文本数据库中)存储到 HashMap <String, Double>.它给了我内存不足"错误.我试图将堆空间增加到 -Xmx15000M.但是它运行了半个小时,然后再次抛出相同的异常.我试图从中读取单词和频率的文件大小为 1.7GB.

I want to store 100 Million terms and their frequencies (in a text database ) into a HashMap <String, Double>. It is giving me "Out of Memory" Error. I tried to increase the heap-space to -Xmx15000M. However it runs half an hour then again throw the same exception. The file size from which I'm trying to read the words and frequencies is 1.7GB.

任何帮助将不胜感激.

谢谢:-)

推荐答案

对于像这样的文字处理,如果您可以忍受更长的查找时间,答案通常是树而不是 hashmap.这种结构对于自然语言来说非常节省内存,其中许多单词都有共同的起始字符串.

For word processing like that the answer is usually a tree rather than hashmap, if you can live with the longer lookup times. That structure is quite memory efficient for natural languages, where many words have common start strings.

根据输入,Patricia 树可能会更好.

Depending on the input, a Patricia tree might be even better.

(另外,如果这确实是来自自然语言的词,你确定你真的需要 100,000,000 个词条吗?大多数常用词的数量都低得惊人,商业解决方案(词预测、拼写校正)很少使用超过 100,000 个词与语言无关.)

(Also, if this is indeed words from a natural language, are you sure you really need 100,000,000 entries? The majority of commonly used words is surprisingly low, commercial solutions (word prediction, spelling correction) rarely use more than 100,000 words regardless of language.)

这篇关于Java中的HashMap,1亿个条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆