快速替代到grep -f [英] Fast alternative to grep -f

查看:142
本文介绍了快速替代到grep -f的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

file.contain.query.txt

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50

由于ENST003在第二次文件中没有条目,ENST004在第一个文件中没有进入预期的输出结果是:

Because ENST003 has no entry in 2nd file and ENST004 has no entry in 1st file the expected output is:

ENST001 90

ENST002 80

要在一个特定的文件中,我们通常可以使用下面的grep多查询:

To grep multi query in a particular file we usually do the following:

grep -f file.contain.query <file.to.search.in >output.file

因为我有像10000查询,并file.to.search.in几乎100000原料需要很长的时间才能完成(如5小时)。有没有一种快速替代到grep -f?

since I have like 10000 query and almost 100000 raw in file.to.search.in it takes very long time to finish (like 5 hours). Is there a fast alternative to grep -f ?

推荐答案

如果你想要一个纯Perl语言编写的选择,看了你的查询文件钥匙到哈希表,然后检查标准输入对那些键:

If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:

#!/usr/bin/env perl
use strict;
use warnings;

# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while (<KEYS>) {
    chomp $_;
    $keyring->{$_} = 1;
}
close KEYS;

# look up key from each line of standard input
while (<STDIN>) {
    chomp $_;
    my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
    if (defined $keyring->{$key}) { print "$_\n"; }
}

您会使用它像这样:

lookup.pl < file.to.search.txt

一个哈希表可以利用的内存相当,但搜索速度更快(哈希表查找是在固定时间内),这是很方便的,因为你有10倍以上的键比商店查找。

A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.

这篇关于快速替代到grep -f的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆