从 Perl 的大文件中删除一行 [英] Deleting a line from a huge file in Perl

查看:36
本文介绍了从 Perl 的大文件中删除一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的文本文件,它的前五行内容如下:

I have huge text file and first five lines of it reads as below :

This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

现在,我想在该文件第三行的随机位置写一些东西,用我正在写的新字符串替换该行中的字符.我可以使用以下代码实现这一点:

Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :

use strict;
use warnings;

my @pos = (0);
open my $fh, "+<", "text.txt";

while(<$fh) {
    push @pos, tell($fh);
}

seek $fh , $pos[2]+1, 0;
print $fh "HELLO";

close($fh);

但是,我无法使用相同的方法弄清楚如何从该文件中删除整个第三行,以便文本如下所示:

However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :

This is fist line
This is second line
This is fourth line
This is fifth line

我不想将整个文件读入数组,也不想使用 Tie::File.是否可以使用 seek 和 tell 实现我的要求?一个解决方案将非常有帮助.

I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.

推荐答案

文件是一个字节序列.我们可以替换(覆盖)其中的一些,但是我们如何删除它们呢?一旦文件被写入,它的字节就不能以任何方式从序列中拉出"或消隐".(可以通过根据需要截断文件来消除文件末尾的那些.)

A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by truncating the file as needed.)

其余内容必须向上"移动,以便要删除的文本后面的内容覆盖它.我们必须重写文件的其余部分.实际上,重写整个文件通常要简单得多.

The rest of the content has to move 'up', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.

作为一个非常基本的例子

As a very basic example

use warnings 'all';
use strict;
use File::Copy qw(move);

my $file_in = '...';
my $file_out = '...';  # best use `File::Temp`

open my $fh_in,  '<', $file_in  or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";

# Remove a line with $pattern
my $pattern = qr/this line goes/;

while (<$fh_in>) 
{
    print $fh_out $_  unless /$pattern/;
}
close $fh_in;
close $fh_out;

# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";

这会将输入文件的每一行写入输出文件,除非一行与给定的模式匹配.然后该文件被重命名,替换原来的(什么不涉及数据复制).见 perlfaq5 中的这个主题.

This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.

因为我们真的使用临时文件,我推荐核心模块 File::Temp 为此.

Since we really use a temporary file I'd recommend the core module File::Temp for that.

通过在更新 '+<' 模式下打开以便仅覆盖文件的一部分,这可能会更有效,但要复杂得多.迭代直到具有模式的行,记录(tell)它的位置和行长,然后复制内存中所有剩余的行.然后 seek 回到该行减去长度的位置,并转储文件的其余部分,覆盖该行及其后面的所有内容.

This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting the line and all that follows it.

请注意,现在文件其余部分的数据被复制两次,尽管一个副本在内存中.如果要删除的行远远低于一个非常大的文件,那么解决这个问题可能是有意义的.如果要删除更多行,这会变得更加混乱.

Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down a very large file. If there are more lines to remove this gets messier.

写出一个新文件并将其复制到原始文件上会更改文件的inode 编号.这可能是某些工具或程序的问题,如果是,您可以通过以下任一方式更新原始文件

Writing out a new file and copying it over the original changes the file's inode number. That may be a problem for some tools or procedures, and if it is you can instead update the original by either

  • 写出新文件后,打开它进行读取并打开原始文件进行写入.这破坏了原始文件.然后从新文件中读取并写入原始文件,从而将内容复制回同一个 inode.完成后删除新文件.

  • Once the new file is written out, open it for reading and open the original for writing. This clobbers the original file. Then read from the new file and write to the original one, thus copying the content back to the same inode. Remove the new file when done.

以读写模式('+<')打开原始文件开始.写入新文件后,seek 到原始文件的开头(或覆盖的位置)并将新文件的内容写入其中.如果新文件较短,记得还要设置文件尾,

Open the original file in read-write mode ('+<') to start with. Once the new file is written, seek to the beginning of the original (or to the place from which to overwrite) and write to it the content of the new file. Remember to also set the end-of-file if the new file is shorter,

truncate $fh, tell($fh); 

复制完成后.这需要一些小心,第一种方法通常可能更安全.

after copying is done. This requires some care and the first way is probably generally safer.

如果文件不是很大,新的文件"可以作为数组或字符串写入"内存中.

If the file weren't huge the new "file" can be "written" in memory, as an array or a string.

这篇关于从 Perl 的大文件中删除一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆