有效地去除的fileA线包含从FILEB字符串 [英] Efficiently remove lines from fileA that contains string from fileB

查看：136 发布时间：2016/8/3 11:53:53 python perl bash unix

本文介绍了有效地去除的fileA线包含从FILEB字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

FILEA所包含的行
FILEB包含字

我怎么能有效从FILEA发现FILEB含文字删除线？

我试过下面，我甚至不知道如果他们的工作，因为它是这么长时间来运行。

试过的grep ：

 的grep -v -f≤（AWK'{$打印1}FileB.txt）FileA.txt＆GT;出

也试过蟒蛇：

  F =打开（sys.argv中[1]，'R'）
OUT =打开（sys.argv中[2]，'W'）
bad_words = f.read（）。splitlines（）开放（'FILEA'）为master_lines：
  在master_lines行：
    如果没有任何（bad_word本着为bad_words bad_word）：
      out.write（线）

FILEA：

 阿巴丹炼油厂是在世界上最大的之一。
一个坏苹果败坏了桶。
阿巴亚拉是巴西南部地区的一个城市。
禁令已经实行对使用传真

FILEB：

 阿巴丹
阿巴亚拉

所需的输出：

 一个坏苹果战利品桶。
禁令已经实行对使用传真

解决方案

你有好看的命令，可能是它的时间去尝试一个好的脚本语言。试着运行下面的 perl的脚本，看看它是否报告回得更快。

 ＃！的/ usr / bin中/ perl的＃使用严格的;
＃使用警告;打开我的$查找，＆LT;，的fileA或死无法打开查找文件：$！;
打开我的$ MASTER，＆LT;，FILEB或死无法打开主文件：$！;
打开我的$ OUTPUT，＆gt;中，out或死：;无法创建输出文件$！我的话;
我@l;而（我的$字=＆LT; $ LOOKUP＆GT;）{
    格格（$字）;
    ++ $话{$词};
}LOOP_FILE_B：虽然（我的$行=＆LT; $ MASTER＆GT;）{
    @l =分流/ \\ s + /，$线;
        我的$ I（0 .. $＃L）{
            如果（定义$ {字$ L [$ i]}）{
                接下来LOOP_FILE_B;
            }
        }
    打印$ OUTPUT$线
}

FileA contains lines FileB contains words

How can I efficiently remove lines from FileB containing words found in FileA?

I tried the following, and I'm not even sure if they work because it's taking so long to run.

Tried grep:

grep -v -f <(awk '{print $1}' FileB.txt) FileA.txt > out

Also tried python:

f = open(sys.argv[1],'r')
out = open(sys.argv[2], 'w')
bad_words = f.read().splitlines()

with open('FileA') as master_lines:
  for line in master_lines:
    if not any(bad_word in line for bad_word in bad_words):
      out.write(line)

FileA:

abadan refinery is one of the largest in the world.
a bad apple spoils the barrel.
abaiara is a city in the south region of brazil.
a ban has been imposed on the use of faxes

FileB:

abadan
abaiara

DESIRED OUTPUT:

a bad apple spoils the barrel.
a ban has been imposed on the use of faxes

解决方案

The commands you have look good so may be its time to try a good scripting language. Try to run the following perl script and see if it reports back any faster.

#!/usr/bin/perl

#use strict;
#use warnings;

open my $LOOKUP, "<", "fileA" or die "Cannot open lookup file: $!";
open my $MASTER, "<", "fileB" or die "Cannot open Master file: $!";
open my $OUTPUT, ">", "out" or die "Cannot create Output file: $!";

my %words;
my @l;

while (my $word = <$LOOKUP>) {
    chomp($word);
    ++$words{$word};
}

LOOP_FILE_B: while (my $line = <$MASTER>) {
    @l = split /\s+/, $line;
        for my $i (0 .. $#l) {
            if (defined $words{$l[$i]}) {
                next LOOP_FILE_B;
            }
        }
    print $OUTPUT "$line"
}

这篇关于有效地去除的fileA线包含从FILEB字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有效地去除的fileA线包含从FILEB字符串 [英] Efficiently remove lines from fileA that contains string from fileB

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

有效地去除的fileA线包含从FILEB字符串 [英] Efficiently remove lines from fileA that contains string from fileB

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭