如何删除与另一个文件中的元素匹配的行 [英] How to delete lines that match elements from another file

查看:148
本文介绍了如何删除与另一个文件中的元素匹配的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在学习Perl,并且试图弄清楚如何完成此任务。我有一个包含一堆文本文件的文件夹,还有一个文件 ions_solvents_cofactors ,其中包含一堆三个字母的列表。



<我编写了一个脚本,该脚本打开并读取文件夹中的每个文件,并且应删除特定列[3]下与列表中某些元素匹配的行。运行不正常。我在脚本末尾遇到一些问题,无法弄清它是什么。



我得到的错误是: rm:无效的选项- '5'



我的输入文件如下:

  ATOM 1592 HD13 LEU D 46 11.698 -10.914 2.183 1.00 0.00 H 
ATOM 1593 HD21 LEU D 46 11.528 -8.800 5.301 1.00 0.00 H
ATOM 1594 HD22 LEU D 46 12.997 -9.452 4.535 1.00 0.00 H
ATOM 1595 HD23 LEU D 46 11.722 -8.718 3.534 1.00 0.00 H
HETATM 1597 N1 308 A 1 0.339 6.314 -9.091 1.00 0.00 N
HETATM 1598 C10 308 A 1 -0.195 5.226 -8.241 1.00 0.00 C
HETATM 1599 C7 308 A 1 -0.991 4.254 -9.133 1.00 0.00 C
HETATM 1600 C1 308 A 1 -1.468 3.053 -8.292 1.00 0.00 C

这是脚本:

 #!/ usr / bin / perl -w 

$ dirname ='。';
opendir(DIR,$ dirname)或死于无法打开目录;
@files = grep(/\.txt$/,readdir(DIR));

foreach $ files(@files){

open(FH,$ files)或死于无法打开$ files\n;
@file_each =< FH> ;;
关闭FH;

关闭DIR;

my @ion_names =();

我的$ ionfile = ions_solvents_cofactors;
open(ION,$ ionfile)或死于无法打开$ ionfile,$!;
my @ion =< ION> ;;
close ION;

for(my $ line = 0; $ line< = $#file_each; $ line ++){

chomp($ file_each [$ line]);
if($ file_each [$ line] =〜/ ^ HETATM /){
@is = split'\s +',$ file_each [$ line];
chomp $ is [3];
}

foreach($ file_each [$ line]){#line 39

if( @ion =〜$ is [3]){
system( rm $ file_each [$ line]);
}
}
}
}

例如,如果输入文件中的 308 在ions_cofactors_solvents`文件中匹配,则删除所有与之匹配的行。

解决方案

我将使用
> code> Tie :: File
模块,它使您可以 tie 将该模块数组您对数组所做的任何更改都会反映在文件中



我已经使用 glob 查找了所有 .txt 文件,并带有:bsd_glob 选项,以便在文件路径中支持空格



第一个工作是构建哈希%matches ,该哈希将映射 ions_solvents_cofactors 到1。这对于测试PDB文件的必需值很简单



然后只需要使用 tie .txt 文件,并测试每一行以查看第4列中的值是否在哈希中表示



我使用变量 $ i 索引到映射磁盘文件的 @file 数组。如果找到匹配项,则使用 splice @file,$ i,1 删除数组元素。 (这自然会导致 $ i 索引下一个元素,而不递增 $ i 。)如果没有匹配项,则 $ i 增加以索引下一个数组元素,使行留在原处

 使用严格; 
全部使用警告;

使用File :: Glob‘:bsd_glob’;
使用Tie :: File;

我的%matches =做{
打开我的$ fh,<, ions_solvents_cofactors.txt;
当地$ /;
地图{$ _ => 1}拆分’,< $ fh> ;;
};

代表我的$ pdb(glob‘* .txt’){

将我的@file绑定到 Tie :: File,$ pdb或死掉$ !;

for(my $ i = 0; $ i< @file;){

next除非my $ col4 =(split'',$ file [$ i] )[3];

if($ matches {$ col4}){
printf qq {从%s \n}中删除行%d,
$ i + 1,
$ pdb;
拼接@file,$ i,1;
}
else {
++ $ i;
}
}
}


I am in the process of learning Perl and I am trying to figure out how to do this task. I have a folder with a bunch of text files and I have a file ions_solvents_cofactors that contains bunch of three letters list.

I wrote a script that opens and reads each file in a folder and should delete those lines that under a specific column [3] matches with some element from the list. It is not working well. I have some problem at the end of the script and cant figure out what it is.

Error I get is : rm: invalid option -- '5'

My input file look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C 

Here is the script:

#!/usr/bin/perl -w

$dirname = '.';
opendir( DIR, $dirname ) or die "cannot open directory";
@files = grep( /\.txt$/, readdir( DIR ) );

foreach $files ( @files ) {

    open( FH, $files ) or die "could not open $files\n";
    @file_each = <FH>;
    close FH;

    close DIR;

    my @ion_names = ();

    my $ionfile   = 'ions_solvents_cofactors';
    open( ION, $ionfile ) or die "Could not open $ionfile, $!";
    my @ion = <ION>;
    close ION;

    for ( my $line = 0; $line <= $#file_each; $line++ ) {

        chomp( $file_each[$line] );
        if ( $file_each[$line] =~ /^HETATM/ ) {
            @is = split '\s+', $file_each[$line];
            chomp $is[3];
        }

        foreach ( $file_each[$line] ) {    #line 39

            if ( "@ion" =~ $is[3] ) {
                system( "rm $file_each[$line]" );
            }
        }
    }
}

So for example if 308 from the input file matches in the file ions_cofactors_solvents` then delete all these lines in which it matches.

解决方案

I would make use of the Tie::File module, which allows you to tie an array to the module so that any changes you make to the array are reflected in the file

I've used glob to find all the .txt files, with the option :bsd_glob so as to support spaces in the file paths

The first job is to build a hash %matches that maps all the values in ions_solvents_cofactors to 1. This makes it trivial to test the PDB files for the required values

Then it's just a matter of using tie on each .txt file, and testing each line to see whether the value in column 4 is represented in the hash

I use variable $i to index into the @file array which maps the on-disk file. If a match is found then the array element is deleted with splice @file, $i, 1. (This naturally leaves $i indexing the next element in sequence without incrementing $i.) If there is no match then $i is incremented to index the next array element, leaving the line in place

use strict;
use warnings 'all';

use File::Glob ':bsd_glob';
use Tie::File;

my %matches = do {
    open my $fh, '<', 'ions_solvents_cofactors.txt';
    local $/;
    map { $_ => 1 } split ' ', <$fh>;
};

for my $pdb ( glob '*.txt' ) {

    tie my @file, 'Tie::File', $pdb or die $!;

    for ( my $i = 0; $i < @file; ) {

        next unless my $col4 = ( split ' ', $file[$i] )[3];

        if ( $matches{$col4} ) {
            printf qq{Removing line %d from "%s"\n},
                    $i+1,
                    $pdb;
            splice @file, $i, 1;
        }
        else {
            ++$i;
        }
    } 
}

这篇关于如何删除与另一个文件中的元素匹配的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆