从Perl中的大文件读取特定行 [英] Reading a specific line from large file in Perl

查看:320
本文介绍了从Perl中的大文件读取特定行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何快速且内存有效的方法来读取大行的特定行,而无需将其加载到内存中?

Is there any fast and memory efficient way to read specific lines of large file, without loading it to memory?

我编写了一个perl脚本,该脚本运行许多fork,我希望它们从文件中读取特定行.

I wrote a perl script, that runs many forks and I would like them to read specific lines from a file.

目前,我正在使用外部命令:

At the moment Im using an external command:

sub getFileLine {
    my ( $filePath, $lineWanted ) = @_;
    $SIG{PIPE} = '_IGNORE_';
    open( my $fh, '-|:utf8', "tail -q -n +$lineWanted \"$filePath\" | head -n 1" );
    my $line = <$fh>;
    close $fh;
    chomp( $line );
    return $line;
}

它既快速又有效-但也许还有一种"Perl式"的方式,它的速度和内存效率都和这种方式一样?

Its fast and it works - but maybe there's a more "Perl-ish" way, as fast and as memory efficient as this one?

您知道,在Perl中创建一个fork进程会复制主进程的内存-因此,如果主进程使用的是10MB,则fork至少会使用那么多的内存.

As you know, creating a fork process in Perl duplicates the main process memory - so if the main process is using 10MB, the fork will use at least that much.

我的目标是使派生进程(所以主进程直到也运行派生)的内存使用尽可能少.这就是为什么我不想将整个文件加载到内存中.

My goal is to keep fork process (so main process until running forks also) memory use as low as possible. Thats why I dont want to load the whole file into memory.

推荐答案

在继续之前,了解fork的工作原理很重要.当您fork一个进程时,操作系统使用写时复制语义共享大量父进程和子进程的内存;只需分开分配父级和子级之间的内存量即可.

Before you go further, it's important to understand how fork works. When you fork a process, the OS uses copy-on-write semantics to share the bulk of the parent and child processes' memory; only the amount of memory that differs between the parent and child need to be separately allocated.

要在Perl中读取文件的一行,这是一种简单的方法:

For reading a single line of a file in Perl, here's a simple way:

open my $fh, '<', $filePath or die "$filePath: $!";
my $line;
while( <$fh> ) {
    if( $. == $lineWanted ) { 
        $line = $_;
        last;
    }
}

这使用特殊的$.变量,该变量保存当前文件句柄的行号.

This uses the special $. variable which holds the line number of the current filehandle.

这篇关于从Perl中的大文件读取特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆