从Perl中的一个巨大文件中删除一行 [英] Deleting a line from a huge file in Perl

查看:736
本文介绍了从Perl中的一个巨大文件中删除一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的文本文件,其前五行内容如下:

I have huge text file and first five lines of it reads as below :

This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

现在,我想在该文件第三行的随机位置写入一些内容,该位置将用我正在编写的新字符串替换该行中的字符.我可以使用以下代码实现这一点:

Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :

use strict;
use warnings;

my @pos = (0);
open my $fh, "+<", "text.txt";

while(<$fh) {
    push @pos, tell($fh);
}

seek $fh , $pos[2]+1, 0;
print $fh "HELLO";

close($fh);

但是,我无法用相同的方法弄清楚如何从该文件中删除整个第三行,以使文本显示如下:

However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :

This is fist line
This is second line
This is fourth line
This is fifth line

我不想将整个文件读入数组,也不想使用Tie :: File.是否可以使用搜索和告知达到我的要求?一个解决方案将非常有帮助.

I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.

推荐答案

文件是字节序列.我们可以替换(覆盖)其中的一些,但是我们如何删除呢?写入文件后,其字节不能以任何方式拉出"序列或空白". (可以根据需要通过截断文件来消除文件末尾的那些文件.)

A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by truncating the file as needed.)

其余内容必须向上移动,以便要删除的文本后面的内容将覆盖它.我们必须重写文件的其余部分.实际上,重写整个文件通常要简单得多.

The rest of the content has to move 'up', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.

作为一个非常基本的例子

As a very basic example

use warnings 'all';
use strict;
use File::Copy qw(move);

my $file_in = '...';
my $file_out = '...';  # best use `File::Temp`

open my $fh_in,  '<', $file_in  or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";

# Remove a line with $pattern
my $pattern = qr/this line goes/;

while (<$fh_in>) 
{
    print $fh_out $_  unless /$pattern/;
}
close $fh_in;
close $fh_out;

# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";

这会将输入文件的每一行写入输出文件,除非某行与给定的模式匹配.然后重命名该文件,替换原始文件(不涉及数据复制).参见此主题在perlfaq5中.

This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.

由于我们确实使用了临时文件,所以我建议使用核心模块 File :: Temp .

Since we really use a temporary file I'd recommend the core module File::Temp for that.

通过以更新'+<'模式打开以便仅覆盖文件的一部分,可以提高效率,但更为复杂.进行迭代,直到带有模式的行,记录(tell)其位置和行长,然后将所有剩余的行复制到内存中.然后seek返回该位置减去该行的长度,并转储复制的文件其余部分,覆盖该行及其后的所有内容.

This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting the line and all that follows it.

请注意,尽管存储器中只有一个副本,但是现在文件其余部分的数据被复制了两次.如果要删除的行距离非常大的文件很远,则可能会遇到麻烦.如果还有更多要删除的行,则会变得更加混乱.

Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down a very large file. If there are more lines to remove this gets messier.

写出一个新文件并将其复制到原始文件上会更改文件的 inode 编号.对于某些工具或过程而言,这可能是一个问题,如果是这样,则可以通过以下任一方法来更新原始文件

Writing out a new file and copying it over the original changes the file's inode number. That may be a problem for some tools or procedures, and if it is you can instead update the original by either

  • 写出新文件后,打开该文件以供阅读,然后打开原始文件以供写.这会掩盖原始文件.然后从新文件读取并写入原始文件,从而将内容复制回同一inode.完成后删除新文件.

  • Once the new file is written out, open it for reading and open the original for writing. This clobbers the original file. Then read from the new file and write to the original one, thus copying the content back to the same inode. Remove the new file when done.

以读写模式('+<')打开原始文件.写入新文件后,seek到原始文件的开头(或要覆盖的位置)并向其中写入新文件的内容.如果新文件较短,请记住还要设置文件结尾,

Open the original file in read-write mode ('+<') to start with. Once the new file is written, seek to the beginning of the original (or to the place from which to overwrite) and write to it the content of the new file. Remember to also set the end-of-file if the new file is shorter,

truncate $fh, tell($fh); 

.这需要一些注意,并且第一种方法通常可能更安全.

after copying is done. This requires some care and the first way is probably generally safer.

如果文件不是很大,则可以将新的文件"以数组或字符串的形式写入"到内存中.

If the file weren't huge the new "file" can be "written" in memory, as an array or a string.

这篇关于从Perl中的一个巨大文件中删除一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆