从 perl 中的多个文本文件中删除重复条目? [英] Remove duplicates entries from multiple text file in perl?

查看：64 发布时间：2021/6/15 20:57:54 perl

本文介绍了从 perl 中的多个文本文件中删除重复条目?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是这个网站的新手，需要帮助从多个文本文件中删除重复条目(在循环中).尝试了下面的代码，但这并没有删除多个文件的重复项，但是它适用于单个文件.

I am new to this site,need help to remove duplicate entries from multiple text file(in a loop).tried the below code but this is not removing duplicates for multiple files,however it is working for a single file.

代码:

my $file = "$Log_dir/File_listing.txt";
my $outfile  = "$Log_dir/Remove_duplicate.txt";; 

open (IN, "<$file") or die "Couldn't open input file: $!"; 
open (OUT, ">$outfile") or die "Couldn't open output file: $!"; 
my %seen = ();
{
  my @ARGV = ($file);
  # local $^I = '.bac';
  while(<IN>){
    print OUT $seen{$_}++;
    next if $seen{$_} > 1;
    print OUT ;
  }
}

谢谢，艺术

推荐答案

脚本中的错误:

你用 $file 覆盖了(一个新的)@ARGV，这样它就再也不能有任何文件参数了.
...这无关紧要，因为你在分配给 @ARGV 之前打开了文件句柄，而且你没有循环参数，你只有一个块 {... } 围绕无用的代码.
%seen 将包含您打开的所有文件的重复数据删除数据，除非您重置它.
您将计数 $seen{$_} 打印到输出文件中，我确定您不需要.

You overwrite (a new copy of) @ARGV with $file, so it can never have any more file arguments.
...which doesn't matter, because you open the file handle before you assign to @ARGV, plus you do not loop around the arguments, you just have a block { ... } around the code that serves no purpose.
%seen will contain dedupe data for all the files you open unless you reset it.
You print the count $seen{$_} to the output file, which I am sure you don't need.

您可以使用菱形运算符使用 @ARGV 参数的隐式打开，但是由于您(可能)需要为每个新文件分配一个正确的输出文件名，这是一个不必要的并发症这样的解决方案.

You could use the implicit open of @ARGV arguments using the diamond operator, but since you (probably) need to assign a proper output file name for each new file, that is an unwanted complication with such a solution.

use strict;
use warnings;                      # always use these

for my $file (@ARGV) {             # loop over all file names
    my $out = "$file.deduped";     # create output file name
    open my $infh,  "<", $file or die "$file: $!";
    open my $outfh, ">", $out  or die "$out: $!";
    my %seen;
    while (<$infh>) {
        print $outfh $_ if !$seen{$_}++;   # print if a line is never seen before
    }
}

请注意，使用词法范围的 %seen 变量会使脚本检查每个文件中的重复项.如果您将变量移到 for 循环之外，您将检查所有文件中的重复项.我不确定你更喜欢哪个.

Note that using a lexically scoped %seen variable makes the script check for duplicates inside each individual file. If you move the variable outside the for loop, you will check for duplicates across all files. I am not sure which you prefer.

这篇关于从 perl 中的多个文本文件中删除重复条目?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 perl 中的多个文本文件中删除重复条目? [英] Remove duplicates entries from multiple text file in perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 perl 中的多个文本文件中删除重复条目? [英] Remove duplicates entries from multiple text file in perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭