文件中的grep模式,打印模式而不是匹配的字符串 [英] grep pattern from file, print the pattern instead matched string
问题描述
我想使用包含正则表达式的文件中的模式进行grep. 模式匹配时,将打印匹配的字符串,但不打印模式. 如何获取模式而不是匹配的字符串?
I want to grep with patterns from file containing regex. When the pattern matches, it prints the matched stringa but not the pattern. How can I get the pattern instead matched strings?
pattern.txt
pattern.txt
Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate
Donut Gorilla Chocolate
Chocolate (English|Fall) apple gorilla
gorilla chocolate (apple|ball)
(ball|donut) apple
strings.txt
strings.txt
apple ball Donut
donut ball chocolate
donut Ball Chocolate
apple donut
chocolate ball Apple
这是grep命令
grep -Eix -f pattern.txt strings.txt
此命令从strings.txt打印匹配的字符串
This command prints matched strings from strings.txt
apple ball Donut
donut ball chocolate
donut Ball Chocolate
但是我想从pattern.txt中找到用于匹配的模式
But I want to find which patterns were used to match from pattern.txt
Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate
pattern.txt可以为小写,大写,带正则表达式的行和不带正则表达式的行,自由行的单词和正则表达式元素.除了方括号和管道,没有其他正则表达式.
The pattern.txt can be lower cases, upper cases, line with regex and without, free numbers of words and regex elements. There is no other kind of regex than brackets and pipe.
我不想使用循环来读取grep的每一行的pattern.txt,因为它很慢. 有没有办法在grep命令中打印模式文件的哪个模式或行号?还是grep以外的任何其他命令可以使这项工作不太慢?
I don't want to use loop to read pattern.txt each line to grep as it's slow. Is there way to print which pattern or line number of pattern file in grep command? or any other command than grep can do the job not too slow?
推荐答案
使用grep
我不知道,但是使用GNU awk:
Using grep
I have no idea but with GNU awk:
$ awk '
BEGIN { IGNORECASE = 1 } # for case insensitivity
NR==FNR { # process pattern file
a[$0] # hash the entries to a
next # process next line
}
{ # process strings file
for(i in a) # loop all pattern file entries
if($0 ~ "^" i "$") { # if there is a match (see comments)
print i # output the matching pattern file entry
# delete a[i] # uncomment to delete matched patterns from a
# next # uncomment to end searching after first match
}
}' pattern strings
输出:
D (A|B) C
对于strings
脚本中的每一行,都会循环遍历pattern
每一行,以查看是否存在多个匹配项.由于区分大小写,因此只有一个匹配项.例如,您可以使用GNU awk的 IGNORECASE
.
For each line in strings
script will loop every pattern
line to see if there are more than one match. There is only one match due to case-sensitivity. You can battle that, for example, using GNU awk's IGNORECASE
.
此外,如果希望每个匹配的一个模式文件条目输出一次,则可以在第一次匹配后将它们从a
中删除:在print
之后添加delete a[i]
.这也可能会给您带来一些性能优势.
Also, if you want each matched one pattern file entry to be outputed once, you could delete them from a
after first match: add delete a[i]
after the print
. That might give you some performance advantage also.
这篇关于文件中的grep模式,打印模式而不是匹配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!