处理大量文字时防止内存问题 [英] Preventing Memory issues when handling large amounts of text

查看:138
本文介绍了处理大量文字时防止内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个程序,它分析了项目的源代码和报告基于代码的各种问题和度量。

I have written a program which analyzes a project's source code and reports various issues and metrics based on the code.

要分析的源代码,我加载代码中存在的项目的目录结构和分析来自存储器的代码的文件。代码经过广泛处理它被传递给待进一步分析的其它方法之前

To analyze the source code, I load the code files that exist in the project's directory structure and analyze the code from memory. The code goes through extensive processing before it is passed to other methods to be analyzed further.

,当它被处理的码被周围传递到几个类

The code is passed around to several classes when it is processed.

有一天,我正在运行它放在更大的项目我集团拥有一个,我的程序crapped出在我身上,因为没有加载到内存中太多的源代码。这是在这一点上的角的情况下,但我希望能够在未来处理此问题。

The other day I was running it on one of the larger project my group has, and my program crapped out on me because there was too much source code loaded into memory. This is a corner case at this point, but I want to be able to handle this issue in the future.

什么是避免记忆问题的最佳方式是什么?

What would be the best way to avoid memory issues?

我在想加载代码,做文件的初步处理,然后序列化的结果到磁盘上,这样,当我需要再次访问他们,我没有去通过再次操纵原始代码的过程。这是否有意义?或者是序列化/反序列化更昂贵然后再处理的代码?

I'm thinking about loading the code, do the initial processing of the file, then serialize the results to disk, so that when I need to access them again, I do not have to go through the process of manipulating the raw code again. Does this make sense? Or is the serialization/deserialization more expensive then processing the code again?

我要保持性能的合理水平,同时解决了这个问题。在大多数情况下,源代码将适合没有问题的内存,所以是有办法只有页当我内存不足我的信息? ?有没有办法时,我的应用程序在内存不足时告诉

I want to keep a reasonable level of performance while addressing this problem. Most of the time, the source code will fit into memory without issue, so is there a way to only "page" my information when I am low on memory? Is there a way to tell when my application is running low on memory?

更新
的问题是不是一个单一的文件填满记忆,它的所有的文件在内存中一次补内存。我现在的想法是,当我处理它们旋转关闭硬盘

Update: The problem is not that a single file fills memory, its all of the files in memory at once fill memory. My current idea is to rotate off the disk drive when I process them

推荐答案

1.6GB仍在可控范围内,并通过自身不应引起内存问题。低效率的字符串操作可能做到这一点。

1.6GB is still manageable and by itself should not cause memory problems. Inefficient string operations might do it.

当你分析源代码,你很可能把它分解成除了某些子 - 令牌或whatver你给他们打电话。如果你的令牌全部源代码合并帐户,加倍内存消耗在那里。根据处理的复杂性你做的乘数效应可能更大。
我在这里的第一个行动将是对你如何使用你的字符串定睛一看,找到一个方法来优化它 - 即第一遍后丢弃原著,压缩空格,或使用索引(指针)到原来的字符串,而不是实际子 - 有一些技巧,可以在这里很有用。

As you parse the source code your probably split it apart into certain substrings - tokens or whatver you call them. If your tokens combined account for entire source code, that doubles memory consumption right there. Depending on the complexity of the processing you do the mutiplier can be even bigger. My first move here would be to have a closer look on how you use your strings and find a way to optimize it - i.e. discarding the origianl after the first pass, compress the whitespaces, or use indexes (pointers) to the original strings rather than actual substrings - there is a number of techniques which can be useful here.

如果没有这将帮助比我会求助于他们交换来来回回磁盘

If none of this would help than I would resort to swapping them to and fro the disk

这篇关于处理大量文字时防止内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆