如何提高Perl脚本性能? [英] How to improve perl script performance?

查看:117
本文介绍了如何提高Perl脚本性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行ucm2.pl脚本来扫描巨大的目录结构(目录是映射到本地的网络驱动器).我有两个perl脚本ucm1.pl和ucm2.pl.我正在为不同的参数分别运行ucm2.pl,它通过ucm1.pl调用.

I am running the ucm2.pl script to scan a huge directory structure (directory is a network drive mapped to local). I have two perl scripts ucm1.pl and ucm2.pl. I am running ucm2.pl parellely for different arguments and it is called through ucm1.pl.

ucm1.pl-

    #!/usr/bin/perl
    use strict; 
    use warnings;
    use Parallel::ForkManager;

    my $filename ="intfSplitList.txt"; #(this will have list of all the input files. eg intfSplit_0....intfSplit_50)
     my $lines;
     my $buffer;
        open(FILE, $filename) or die "Can't open `$filename': $!";
        while (<FILE>) {
            $lines = $.;
        }
        close FILE;
    print "The number of lines in $filename is $lines \n";


    my $pm = Parallel::ForkManager->new($lines); #(it will set the no. of parallel processes)

    open (my $fh, '<', "intfSplitList.txt") or die $!;
    while (my $data = <$fh>) {
      chomp $data;

      my $pid = $pm->start and next;

     system ("perl ucm2.pl -iinput.txt -f$data");  
#(call the ucm2.pl) #(input.txt file will have search keyword and $data will have intfSplit_*.txt files)

      $pm->finish; # Terminates the child process
    }

ucm2.pl代码-

ucm2.pl code-

#!/usr/bin/perl
use strict;
use warnings;  
use File::Find;
use Getopt::Std;
#getting the input parameters
getopts('i:f:');

our($opt_i, $opt_f);
my $searchKeyword     = $opt_i;                               #Search keyword file.
my $intfSplit         = $opt_f;                               #split file
my $path              = "Z:/aims/";                           #source directory
my $searchString;                                             #search keyword

open FH, ">log.txt";                                          #open the log file to write

print FH "$intfSplit ". "started at ".(localtime)."\n";       #write the log file

open (FILE,$intfSplit);                                       #open the split file to read

while(<FILE>){

   my $intf= $_;                                             #setting the interface to intf
   chomp($intf);
   my $dir = $path.$intf;
   chomp($dir);
   print "$dir \n";                                              
   open(INP,$searchKeyword);                         #open the search keyword file to read

   while (<INP>){      

   $searchString =$_;                           #setting the search keyword to searchString
   chomp($searchString);
   print "$searchString \n";
   open my $out, ">", "vob$intfSplit.txt" or die $!; #open the vobintfSplit_* file to write

#calling subroutine printFile to find and print the path of element
find(\&printFile,$dir);                                       

#the subroutine will search for the keyword and print the path if keyword is exist in file.
sub printFile {
   my $element = $_;

   if(-f $element && $element =~ /\.*$/){ 

      open my $in, "<", $element or die $!;
      while(<$in>) {
         if (/\Q$searchString\E/) {
            my $last_update_time = (stat($element))[9];
            my $timestamp  = localtime($last_update_time);
            print $out "$File::Find::name". "     $timestamp". "     $searchString\n";
            last;
          }
        }
      }
    }
  }
}
print FH "$intfSplit ". "ended at ".(localtime)."\n";         #write the log file

一切正常,但对于单个关键字搜索而言,其运行时间也很长. 任何人都可以提出一些更好的方法来提高性能.

everything is running fine but its running for very long time for single keyword search also. can anyone please suggest some better way to improve the performance.

提前谢谢!

推荐答案

运行Perl的多个实例会增加很多不必要的开销.您是否看过我对上一个问题的回答,建议您对此进行更改?

Running multiple instances of Perl adds a lot of unnecessary overhead. Have you looked at my answer to your previous question, which suggested changing this?

正如我之前提到的,您在这里有一些不必要的重复:没有理由多次打开和处理您的搜索关键字文件.您可以制作一个子集来打开关键字文件,并将关键字放入一个数组中.然后将这些关键字传递给另一个进行搜索的子项.

Also as I mentioned previously, you have some unnecessary repetition here: there is no reason to open and process your search keyword file multiple times. You can make one sub that opens the keyword file and puts the keywords in an array. Then pass these keywords to another sub that does the searching.

通过一次搜索多个关键字,您可以更快地搜索多个关键字.做这样的事情来获取您的关键字:

You can make a search for multiple keywords much faster by searching for them all at once. Do something like this to get your keywords:

my @keywords = map {chomp;$_} <$fh>;
my $regex = "(" . join('|', map {quotemeta} @keywords) . ")";

现在您有一个像这样的正则表达式:(\Qkeyword1\E|\Qkeyword2\E).您只需要搜索一次文件,如果您想查看匹配的关键字,只需检查$1的内容.这不会加快单个关键字的搜索速度,但是搜索多个关键字的速度几乎与搜索单个关键字的速度一样.

Now you have a single regex like this: (\Qkeyword1\E|\Qkeyword2\E). You only have to search the files once, and if you want to see which keyword matched, just check the content of $1. This won't speed things up for a single keyword, but searching for many keywords will be nearly as fast as searching for a single one.

但是,最终,如果您要在网络上搜索巨大的目录结构,则可以加快速度的数量可能会受到限制.

Ultimately, though, if you are searching a huge directory structure on the network, there may be a limit to how much you can speed things up.

更新:更正了斩音问题.谢谢阿蒙.

Update: corrected the chomping. Thanks amon.

这篇关于如何提高Perl脚本性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆