如果行中包含“例子 [英] awk keep if line contains " example "
问题描述
好的,我希望保留包含多个关键字的行,
okay so I wish to keep lines containing several keywords,
列表示例:
Name:email:username #registered
Name2:email2:username2
Name3:email3:username3 #registered #subscribed #phonever
Name4:email4:username4 #unconfirmed
我要提取的行如果包含#registered,#subscribed,#phonever
What I want to do is extract lines if they contain " #registered, #subscribed, #phonever
我想要的输出示例,
Name:email:username #registered
Name3:email3:username3 #registered #subscribed #phonever
推荐答案
使用awk
(在固定字符串列表上使用正则表达式交替运算符|
):
With awk
(use regex alternation operator, |
, on a list of fixed strings):
awk '/#registered|#subscribed|#phonever/' file
/.../
下的部分称为 { ... }
).但是,由于默认操作是:{ print $0 }
(打印完整的输入记录/行),因此无需在此处指定.
The part under /.../
is called an awk
pattern and for the matching lines it executes the action that follows (as { ... }
). But since the default action is: { print $0 }
(printing the complete input record/line), there's no need to specify it here.
与sed
类似,您可以说:
sed -nE '/#registered|#subscribed|#phonever/p' file
,但是现在我们必须指定-n
以默认情况下跳过打印,并仅使用p
命令打印与模式匹配的那些行(称为sed
地址). -E
告诉sed
使用了POSIX ERE(扩展的正则表达式),在这里我们需要它,因为默认的POSIX BRE(基本正则表达式)没有定义交替运算符.
but now we have to specify -n
to skip printing by default, and print with the p
command only those lines that match the pattern (called sed
address). The -E
tells sed
to used POSIX ERE (extended regex), and we need it here, because the default, POSIX BRE (basic regex) does not define the alternation operator.
对于简单过滤(并打印与某些模式匹配的行),grep
也是一个选项(也是非常快速的选择):
For simple filtering (and printing the lines that match some pattern), grep
is also an option (and a very fast option at that):
grep '#registered\|#subscribed\|#phonever' file
更通用的解决方案(带有模式文件的awk
)
较大(可能是动态)的模式列表的解决方案可能是将所有模式保存在单独的文件中,例如在patterns
中:
A bit more general solution (awk
with patterns file)
Solution for larger (and possibly dynamic) lists of patterns could be to keep all patterns in a separate file, for example in patterns
:
#registered
#subscribed
#phonever
并使用此awk
程序:
awk 'NR==FNR { pat[$0]=1 } NR>FNR { for (p in pat) if ($0 ~ p) {print;next} }' patterns file
,它将首先将所有模式加载到pat
数组中,然后尝试匹配file
中每一行上的所有模式,并在找到的第一个匹配项上打印并前进到下一行.
which will first load all patterns into pat
array, and then try to match any of those patterns on each of the lines in file
, printing and advancing on to the next line on the first match found.
结果相同:
Name:email:username #registered
Name3:email3:username3 #registered #subscribed #phonever
,但是对于每组新的模式,脚本现在都不会更改.但是请注意,这会降低性能(就像一般解决方案一样).对于较短的模式列表和较小的文件,这应该不是问题.
but the script now doesn't change for each new set of patterns. Note however, this caries a performance penalty (as general solutions usually do). For shorter lists of patterns and smaller files, this shouldn't be a problem.
基于上面的方法(在文件中保留固定字符串模式"列表),我们实际上可以使用grep
-它提供了一个专门的选项(-f FILE
)从文件中获取模式,每行一个.为了进一步加快匹配速度,我们还应该使用-F
/--fixed-strings
选项.
Building on the approach from above (of keeping a list of fixed-string "patterns" in a file), we can actually use grep
-- which provides a specialized option (-f FILE
) for obtaining patterns from file, one per line. To further speed-up the matching, we should also use -F
/--fixed-strings
option.
所以,这个:
grep -Ff patterns file
将以令人难以置信的快速度,以最小的内存开销处理一长串的固定字符串模式和大文件.
will be incredibly fast, handling long lists of fixed-string patterns and huge files with minimal memory overhead.
这篇关于如果行中包含“例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!