如果行中包含“例子 [英] awk keep if line contains " example "

查看:120
本文介绍了如果行中包含“例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我希望保留包含多个关键字的行,

okay so I wish to keep lines containing several keywords,

列表示例:

Name:email:username #registered
Name2:email2:username2
Name3:email3:username3 #registered #subscribed #phonever
Name4:email4:username4 #unconfirmed

我要提取的行如果包含#registered,#subscribed,#phonever

What I want to do is extract lines if they contain " #registered, #subscribed, #phonever

我想要的输出示例,

Name:email:username #registered
Name3:email3:username3 #registered #subscribed #phonever

推荐答案

使用awk(在固定字符串列表上使用正则表达式交替运算符|):

With awk (use regex alternation operator, |, on a list of fixed strings):

awk '/#registered|#subscribed|#phonever/' file

/.../下的部分称为 模式,并针对匹配的行执行随后的操作(如{ ... }).但是,由于默认操作是:{ print $0 }(打印完整的输入记录/行),因此无需在此处指定.

The part under /.../ is called an awk pattern and for the matching lines it executes the action that follows (as { ... }). But since the default action is: { print $0 } (printing the complete input record/line), there's no need to specify it here.

sed类似,您可以说:

sed -nE '/#registered|#subscribed|#phonever/p' file

,但是现在我们必须指定-n以默认情况下跳过打印,并仅使用p命令打印与模式匹配的那些行(称为sed地址). -E告诉sed使用了POSIX ERE(扩展的正则表达式),在这里我们需要它,因为默认的POSIX BRE(基本正则表达式)没有定义交替运算符.

but now we have to specify -n to skip printing by default, and print with the p command only those lines that match the pattern (called sed address). The -E tells sed to used POSIX ERE (extended regex), and we need it here, because the default, POSIX BRE (basic regex) does not define the alternation operator.

对于简单过滤(并打印与某些模式匹配的行),grep也是一个选项(也是非常快速的选择):

For simple filtering (and printing the lines that match some pattern), grep is also an option (and a very fast option at that):

grep '#registered\|#subscribed\|#phonever' file


更通用的解决方案(带有模式文件的awk)

较大(可能是动态)的模式列表的解决方案可能是将所有模式保存在单独的文件中,例如在patterns中:


A bit more general solution (awk with patterns file)

Solution for larger (and possibly dynamic) lists of patterns could be to keep all patterns in a separate file, for example in patterns:

#registered
#subscribed
#phonever

并使用此awk程序:

awk 'NR==FNR { pat[$0]=1 } NR>FNR { for (p in pat) if ($0 ~ p) {print;next} }' patterns file

,它将首先将所有模式加载到pat数组中,然后尝试匹配file中每一行上的所有模式,并在找到的第一个匹配项上打印并前进到下一行.

which will first load all patterns into pat array, and then try to match any of those patterns on each of the lines in file, printing and advancing on to the next line on the first match found.

结果相同:

Name:email:username #registered
Name3:email3:username3 #registered #subscribed #phonever

,但是对于每组新的模式,脚本现在都不会更改.但是请注意,这会降低性能(就像一般解决方案一样).对于较短的模式列表和较小的文件,这应该不是问题.

but the script now doesn't change for each new set of patterns. Note however, this caries a performance penalty (as general solutions usually do). For shorter lists of patterns and smaller files, this shouldn't be a problem.

基于上面的方法(在文件中保留固定字符串模式"列表),我们实际上可以使用grep -它提供了一个专门的选项(-f FILE)从文件中获取模式,每行一个.为了进一步加快匹配速度,我们还应该使用-F/--fixed-strings选项.

Building on the approach from above (of keeping a list of fixed-string "patterns" in a file), we can actually use grep -- which provides a specialized option (-f FILE) for obtaining patterns from file, one per line. To further speed-up the matching, we should also use -F/--fixed-strings option.

所以,这个:

grep -Ff patterns file

将以令人难以置信的快速度,以最小的内存开销处理一长串的固定字符串模式和大文件.

will be incredibly fast, handling long lists of fixed-string patterns and huge files with minimal memory overhead.

这篇关于如果行中包含“例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆