快速替代到grep -f [英] Fast alternative to grep -f

查看：142 发布时间：2016/7/28 14:50:00 perl awk

本文介绍了快速替代到grep -f的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50

由于ENST003在第二次文件中没有条目，ENST004在第一个文件中没有进入预期的输出结果是：

Because ENST003 has no entry in 2nd file and ENST004 has no entry in 1st file the expected output is:

ENST001 90

ENST002 80

要在一个特定的文件中，我们通常可以使用下面的grep多查询：

To grep multi query in a particular file we usually do the following:

grep -f file.contain.query <file.to.search.in >output.file

因为我有像10000查询，并file.to.search.in几乎100000原料需要很长的时间才能完成（如5小时）。有没有一种快速替代到grep -f？

since I have like 10000 query and almost 100000 raw in file.to.search.in it takes very long time to finish (like 5 hours). Is there a fast alternative to grep -f ?

推荐答案

如果你想要一个纯Perl语言编写的选择，看了你的查询文件钥匙到哈希表，然后检查标准输入对那些键：

If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:

#!/usr/bin/env perl
use strict;
use warnings;

# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while (<KEYS>) {
    chomp $_;
    $keyring->{$_} = 1;
}
close KEYS;

# look up key from each line of standard input
while (<STDIN>) {
    chomp $_;
    my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
    if (defined $keyring->{$key}) { print "$_\n"; }
}

您会使用它像这样：

lookup.pl < file.to.search.txt

一个哈希表可以利用的内存相当，但搜索速度更快（哈希表查找是在固定时间内），这是很方便的，因为你有10倍以上的键比商店查找。

A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.

这篇关于快速替代到grep -f的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

快速替代到grep -f [英] Fast alternative to grep -f

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

快速替代到grep -f [英] Fast alternative to grep -f

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭