避免“内存不足错误”在Java(eclipse)中,当使用大数据结构时? [英] Avoid an "out of memory error" in Java(eclipse), when using large data structure?

查看:192
本文介绍了避免“内存不足错误”在Java(eclipse)中,当使用大数据结构时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我正在编写一个不幸的是需要使用一个巨大的数据结构来完成它的工作的程序,但是它在初始化期间出现了内存不足错误。虽然我完全理解这是什么意思,为什么这是一个问题,但我无法克服这个问题,因为我的程序需要使用这个大型的结构,我不知道存储它的其他方式。



程序首先索引我提供的大量文本文件。这样可以正常工作。



然后使用这个索引来初始化一个大的2D数组。该数组将具有n²个条目,其中n是文本语料库中的唯一字数。对于相对较小的块,我正在测试它(约60个文件),它需要大约30,000x30,000个条目。一旦我在我的全部目标语料库中运行,这可能会更大。



它在索引后始终失败,同时正在初始化数据结构(稍后工作)。



我所做的事情包括:




  • 修改我的代码以使用原语 int [] 而不是 TreeMap

  • 消除冗余结构等。 ..

  • 此外,我已经运行程序与 -Xmx2g 以最大限度地分配我的分配内存



我相信这不是一个简单的代码解决方案,但最有可能需要一个非常新的方法。我正在寻找什么方法,任何想法?



谢谢,
B。

解决方案

听起来像(使用你的数组的一些假设)大多数条目将为0.如果是这样,你可以考虑使用稀疏矩阵表示。



如果你真的 / em>许多条目(您当前的数组位于超过3千兆字节已经,即使没有开销),那么你将不得不使用某种磁盘存储,或者一个惰性加载/卸载系统。


OK, so I am writing a program that unfortunately needs to use a huge data structure to complete its work, but it is failing with a "out of memory error" during its initialization. While I understand entirely what that means and why it is a problem, I am having trouble overcoming it, since my program needs to use this large structure and I don't know any other way to store it.

The program first indexes a large corpus of text files that I provide. This works fine.

Then it uses this index to initialize a large 2D array. This array will have n² entries, where "n" is the number of unique words in the corpus of text. For the relatively small chunk I am testing it o n(about 60 files) it needs to make approximately 30,000x30,000 entries. This will probably be bigger once I run it on my full intended corpus too.

It consistently fails every time, after it indexes, while it is initializing the data structure(to be worked on later).

Things I have done include:

  • revamp my code to use a primitive int[] instead of a TreeMap
  • eliminate redundant structures, etc...
  • Also, I have run the program with-Xmx2g to max out my allocated memory

I am fairly confident this is not going to be a simple line of code solution, but is most likely going to require a very new approach. I am looking for what that approach is, any ideas?

Thanks, B.

解决方案

It sounds like (making some assumptions about what you're using your array for) most of the entries will be 0. If so, you might consider using a sparse matrix representation.

If you really have that many entries (your current array is somewhere over 3 gigabytes already, even assuming no overhead), then you'll have to use some kind of on-disk storage, or a lazy-load/unload system.

这篇关于避免“内存不足错误”在Java(eclipse)中,当使用大数据结构时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆