当文件数量巨大时,如何提高Perl中的grep效率 [英] how to improve grep efficiency in perl when the file number is huge
问题描述
我想用到grep从位于以下目录结构中的日志文件一些日志信息的Perl:
$ jobDir / jobXXXX / host.log
其中 XXXX
是一个工作号码,从1到几千。在 $ jobDir
下没有其他类型的子目录,除了 jobXXXX
下的日志外,没有其他文件。该脚本是:
my @Info; #存储日志信息
my $ Num = 0;
@Info = qx(grep信息-r $ jobDir); #这个可以吗 ?
的foreach(@Info){
如果($ _ =〜/\((\d+)\)(.*)\((\d+)\ )/){
输出(xxxxxxxx);
}
$ Num = $ Num + 1; #number count
}
我们发现,当工作号码是几千时,此脚本将花费很长时间输出信息。
有什么办法可以提高效率吗?
谢谢!
< DIV类=h2_lin>解决方案
您应该搜索那些日志文件一个接一个,并扫描由线每个日志文件中的行,而不是读取的grep 输出code>到内存(这可能会花费大量内存,并减慢程序,甚至是系统):
#未经测试的脚本
my $ Num;
的foreach我的$日志(小于$ jobDir /工作* / host.log>){
打开我的$ logfh, '<', $日志 或死无法打开$日志:$ !;
而(小于$ logfh>)。{
如果(M /信息/){
如果(M / \((\d +)\)(*)\ ((\d +)\)/){
输出(xxx);
}
$ Num ++;
}
}
close $ logfh;
}
I want to grep some log information from the log files located in the following directory structure using perl:
$jobDir/jobXXXX/host.log
where XXXX
is a job number, from 1 to a few thousands. There's no other kinds of sub directory under $jobDir
and no other files except logs under jobXXXX
. The script is :
my @Info; #store the log informaiton
my $Num = 0;
@Info = qx(grep "information" -r $jobDir); #is this OK ?
foreach(@Info){
if($_=~ /\((\d+)\)(.*)\((\d+)\)/){
Output(xxxxxxxx);
}
$Num=$Num+1; #number count
}
It is found that when then job number is a few thousands, this script will take very long time to output the information.
Is there any way to improve its efficiency?
Thanks!
You should search those log file one by one, and scan each log file line by line, instead of reading the output of grep
to memory (that could cost lots of memory, and slow your program, even your system):
# untested script
my $Num;
foreach my $log (<$jobDir/job*/host.log>) {
open my $logfh, '<', "$log" or die "Cannot open $log: $!";
while (<$logfh>) {
if (m/information/) {
if(m/\((\d+)\)(.*)\((\d+)\)/) {
Output(xxx);
}
$Num++;
}
}
close $logfh;
}
这篇关于当文件数量巨大时,如何提高Perl中的grep效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!