Perl正则表达式使用太多内存? [英] perl regex using too much memory?

查看:108
本文介绍了Perl正则表达式使用太多内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个perl例程,导致我的系统经常出现内存不足"问题.

I have a perl routine that is causing me frequent "out of memory" issues in the system.

脚本执行3件事

1> get the output of a  command to an array   (@arr = `$command`    --> array will hold about 13mb of data after the command)
2> Use a large regex to match the contents of individual array elements  -->

The regex is something like this
if($new_element =~ m|([A-Z0-9\-\._\$]+);\d+\s+([0-9]+)-([A-Z][A-Z][A-Z])-([0-9][0-9][0-9][0-9]([0-9]+)\:([0-9]+)\:([0-9]+)|io) 
<put to hash>
3> Put the array in a persistent hash map.
$hash_var{arr[0]} = "Some value"

修改: 正则表达式处理的样本数据是

edit: Sample data processed by regex are

Z4:[newuser.newdir]TESTOPEN_ERROR.COM;4
                                                    8-APR-2014 11:14:12.58
Z4:[newuser.newdir]TEST_BOC.CFG;5
                                                    5-APR-2014 10:43:11.70
Z4:[newuser.newdir]TEST_BOC.COM;20
                                                    5-APR-2014 10:41:01.63
Z4:[newuser.newdir]TEST_NEWRT.COM;17
                                                    4-APR-2014 10:30:56.11

大约有10000条这样的行

About 10000 lines like these

我首先怀疑数组和散列在一起可能会占用过多的内存. 但是我开始认为此正则表达式也可能与内存不足有关.

I started by suspecting the array and hash together may be consuming too much of memory. However i have started to think this regex might have some thing to do with out of memory as well.

perl regex(带有'io'选项!)真的是造成内存不足的罪魁祸首吗?

Does perl regex(with 'io' option!) really the main culprit causing out of memory?

推荐答案

这与正则表达式无关.

如果在内存受限的环境中运行,则应一次处理一个数据记录,而不要一次获取所有记录.假设您像这样提取数据:

If you are operating in a memory-constrained environment, you should process data records one at a time rather than fetching all of them at once. Let's assume you pull your data like:

my @data = `some command`;
for my $line (@data) {
    ... # process the line
}

这非常浪费,因为您需要存储数据和处理输出(在您的情况下为哈希).

This is incredibly wasteful because you need storage for the data, and for the output of your processing (in your case: the hash).

相反,逐行处理输入.为此,我们可以使用open函数代替反引号:

Instead, process the input line by line. We can use the open function instead of backticks for this:

open my $cmd, '-|', 'some', 'command' or die "Can't run some command: $!";
while (my $line = <$cmd>) {
    ... # process the line
}

这里不需要数组,可以为我们节省13MB的内存,我们现在可以将其用于其他用途.

There is no need for an array here, which saves us 13MB of memory which we can now put to use otherwise.

这篇关于Perl正则表达式使用太多内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆