从比grep的另一个使用ACK或AWK或者更好的办法获得一个文件模式? [英] Obtain patterns in one file from another using ack or awk or better way than grep?

查看:101
本文介绍了从比grep的另一个使用ACK或AWK或者更好的办法获得一个文件模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在使用另一个文件一个文件(模式列表)来获取模式 ACK -f 的grep 选项?我看到有在 ACK -f 选项,但它与 -f <不同/ code>在的grep

Is there a way to obtain patterns in one file (a list of patterns) from another file using ack as the -f option in grep? I see there is an -f option in ack but it's different with the -f in grep.

也许一个例子给你一个更好的主意。假设我有file1中:

Perhaps an example will give you a better idea. Suppose I have file1:

file1:
a
c
e

和文件2:

file2:
a  1
b  2
c  3
d  4
e  5

我想从文件2获得file1中所有的模式给:

And I want to obtain all the patterns in file1 from file2 to give:

a  1
c  3
e  5

能否 ACK 做到这一点?否则,有没有更好的方式来处理工作(比如像 AWK 或使用哈希),因为我有几百万的记录文件和确实需要一个有效的方式来完成?谢谢!

Can ack do this? Otherwise, is there a better way to handle the job (such like awk or using hash) because I have millions of records in both files and really need an efficient way to complete? Thanks!

推荐答案

下面是一个使用哈希从文件1按住组通缉键办理O每次迭代(1)(摊销时间)查找在一个Perl单行文件2线。因此,它会为O运行(M + N)的时间,其中M是你按键的行数,n是文件中的行,你要测试的数量。

Here's a Perl one-liner that uses a hash to hold the set of wanted keys from file1 for O(1) (amortized time) lookups per iteration over the lines of file2. So it will run in O(m+n) time, where m is number of lines in your key set, and n is the number of lines in the file you're testing.

perl的-ne'BEGIN {开K,shift@ARGV;chomp(@a=<K>);@hash{@a}=()}m/^(\\p{alpha}+)\\s/&&exists$hash{$1}&&print' tkeys文件2

该按键会在内存中举行,而文件2是通过对按键线路进行测试线。

The key set will be held in memory while file2 is tested line by line against the keys.

下面是一个使用Perl的 -a 命令行选项同样的事情:

Here's the same thing using Perl's -a command line option:

perl的-ane'BEGIN {开G,换挡@ ARGV;格格(@ A =&LT; G&GT;); @ H一般{} @a =();}存在$ H {$ F [0]}&放大器;&安培;打印tkeys文件2

第二个版本可能是对眼睛更容易一些。 ;)

The second version is probably a little easier on the eyes. ;)

您必须要记住的一件事是,它更可能是你必然的IO比约束处理器。因此,目标应该是尽量减少IO使用。当整个查找键集的哈希,提供O时举行(1)摊销查找。这种解决方案可能比其它解决方案的优点是,某些(更慢)的解决方案将具有通过密钥文件(文件1)中一次file2中的各行运行。诸如此类的解决方案将是O(M * N),其中M为密钥文件的大小,n是文件2的大小。另一方面,该散列方法提供了O(M + N)的时间。这就是差的大小。它由通过IO一次性读取键消除通过按键设定线性搜索,以及更多的利益好处。

One thing you have to remember here is that it's more likely that you're IO bound than processor bound. So the goal should be to minimize IO use. When the entire lookup key set is held in a hash that offers O(1) amortized lookups. The advantage this solution may have over other solutions is that some (slower) solutions will have to run through your key file (file1) one time for each line of file2. That sort of solution will be O(m*n) where m is the size of your key file, and n is the size of file2. On the other hand, this hash approach provides O(m+n) time. That's a magnitude of difference. It benefits by eliminating linear searches through the key-set, and further benefits by reading the keys via IO only one time.

这篇关于从比grep的另一个使用ACK或AWK或者更好的办法获得一个文件模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆