文件中的grep模式,打印模式而不是匹配的字符串 [英] grep pattern from file, print the pattern instead matched string

查看:104
本文介绍了文件中的grep模式,打印模式而不是匹配的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用包含正则表达式的文件中的模式进行grep. 模式匹配时,将打印匹配的字符串,但不打印模式. 如何获取模式而不是匹配的字符串?

I want to grep with patterns from file containing regex. When the pattern matches, it prints the matched stringa but not the pattern. How can I get the pattern instead matched strings?

pattern.txt

pattern.txt

Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate
Donut Gorilla Chocolate
Chocolate (English|Fall) apple gorilla
gorilla chocolate (apple|ball)
(ball|donut) apple

strings.txt

strings.txt

apple ball Donut
donut ball chocolate
donut Ball Chocolate
apple donut
chocolate ball Apple

这是grep命令

grep -Eix -f pattern.txt strings.txt

此命令从strings.txt打印匹配的字符串

This command prints matched strings from strings.txt

apple ball Donut
donut ball chocolate
donut Ball Chocolate

但是我想从pattern.txt中找到用于匹配的模式

But I want to find which patterns were used to match from pattern.txt

Apple (Ball|chocolate|fall) Donut
donut (apple|ball) Chocolate

pattern.txt可以为小写,大写,带正则表达式的行和不带正则表达式的行,自由行的单词和正则表达式元素.除了方括号和管道,没有其他正则表达式.

The pattern.txt can be lower cases, upper cases, line with regex and without, free numbers of words and regex elements. There is no other kind of regex than brackets and pipe.

我不想使用循环来读取grep的每一行的pattern.txt,因为它很慢. 有没有办法在grep命令中打印模式文件的哪个模式或行号?还是grep以外的任何其他命令可以使这项工作不太慢?

I don't want to use loop to read pattern.txt each line to grep as it's slow. Is there way to print which pattern or line number of pattern file in grep command? or any other command than grep can do the job not too slow?

推荐答案

使用grep我不知道,但是使用GNU awk:

Using grep I have no idea but with GNU awk:

$ awk '
BEGIN { IGNORECASE = 1 }      # for case insensitivity
NR==FNR {                     # process pattern file
    a[$0]                     # hash the entries to a
    next                      # process next line
}
{                             # process strings file
    for(i in a)               # loop all pattern file entries
        if($0 ~ "^" i "$") {  # if there is a match (see comments)
            print i           # output the matching pattern file entry
            # delete a[i]     # uncomment to delete matched patterns from a
            # next            # uncomment to end searching after first match
        }
}' pattern strings

输出:

D (A|B) C

对于strings脚本中的每一行,都会循环遍历pattern每一行,以查看是否存在多个匹配项.由于区分大小写,因此只有一个匹配项.例如,您可以使用GNU awk的 IGNORECASE .

For each line in strings script will loop every pattern line to see if there are more than one match. There is only one match due to case-sensitivity. You can battle that, for example, using GNU awk's IGNORECASE.

此外,如果希望每个匹配的一个模式文件条目输出一次,则可以在第一次匹配后将它们从a中删除:在print之后添加delete a[i].这也可能会给您带来一些性能优势.

Also, if you want each matched one pattern file entry to be outputed once, you could delete them from a after first match: add delete a[i] after the print. That might give you some performance advantage also.

这篇关于文件中的grep模式,打印模式而不是匹配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆