从Perl中的大文件读取特定行 [英] Reading a specific line from large file in Perl
问题描述
是否有任何快速且内存有效的方法来读取大行的特定行,而无需将其加载到内存中?
Is there any fast and memory efficient way to read specific lines of large file, without loading it to memory?
我编写了一个perl脚本,该脚本运行许多fork,我希望它们从文件中读取特定行.
I wrote a perl script, that runs many forks and I would like them to read specific lines from a file.
目前,我正在使用外部命令:
At the moment Im using an external command:
sub getFileLine {
my ( $filePath, $lineWanted ) = @_;
$SIG{PIPE} = '_IGNORE_';
open( my $fh, '-|:utf8', "tail -q -n +$lineWanted \"$filePath\" | head -n 1" );
my $line = <$fh>;
close $fh;
chomp( $line );
return $line;
}
它既快速又有效-但也许还有一种"Perl式"的方式,它的速度和内存效率都和这种方式一样?
Its fast and it works - but maybe there's a more "Perl-ish" way, as fast and as memory efficient as this one?
您知道,在Perl中创建一个fork进程会复制主进程的内存-因此,如果主进程使用的是10MB,则fork至少会使用那么多的内存.
As you know, creating a fork process in Perl duplicates the main process memory - so if the main process is using 10MB, the fork will use at least that much.
我的目标是使派生进程(所以主进程直到也运行派生)的内存使用尽可能少.这就是为什么我不想将整个文件加载到内存中.
My goal is to keep fork process (so main process until running forks also) memory use as low as possible. Thats why I dont want to load the whole file into memory.
推荐答案
在继续之前,了解fork
的工作原理很重要.当您fork
一个进程时,操作系统使用写时复制语义共享大量父进程和子进程的内存;只需分开分配父级和子级之间的内存量即可.
Before you go further, it's important to understand how fork
works. When you fork
a process, the OS uses copy-on-write semantics to share the bulk of the parent and child processes' memory; only the amount of memory that differs between the parent and child need to be separately allocated.
要在Perl中读取文件的一行,这是一种简单的方法:
For reading a single line of a file in Perl, here's a simple way:
open my $fh, '<', $filePath or die "$filePath: $!";
my $line;
while( <$fh> ) {
if( $. == $lineWanted ) {
$line = $_;
last;
}
}
这使用特殊的$.
变量,该变量保存当前文件句柄的行号.
This uses the special $.
variable which holds the line number of the current filehandle.
这篇关于从Perl中的大文件读取特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!