SED,AWK,Perl或法:找到preFIX +正则表达式字符串,忽略输入的其余部分 [英] sed, awk, perl or lex: find strings by prefix+regex, ignoring rest of input
问题描述
我需要找到具有一定的preFIX字符串,其次是正则表达式,在一堆文件,但忽略输入的其余部分(包括preFIX前行的内容,并经过匹配的正则表达式的结束)。
I need to find strings with a certain prefix, followed by a regexp, in a bunch of files, but ignore the rest of the input (including the content of the line before the prefix, and after the end of the matching regexp).
什么是这个职位的最佳工具? 的grep
找到完整的生产线; SED
通常只用于编辑和选择和替换; AWK
? perl的
?
What's the best tool for the job? grep
finds complete lines; sed
is usually used just for editing and select-and-replace; awk
? perl
?
我也想过法
,但我真后,编译器编译?!
I also thought about lex
, but am I really after a compiler compiler?!
编辑:输入是HTML文件几千,在preFIX +正规前pression将 https://开头([-.0-9A-ZA-Z] + \\ {2})
(其中我想 $ 1
),输入的其余部分被忽略[A-ZA-Z]
the input is several thousand of HTML files, the prefix + regular expression would be https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})
(of which I want $1
), and the rest of the input ignored.
推荐答案
如果你不会有图案的多个在同一行,我可能会使用 SED
:
If you won't have more than one of the pattern on a single line, I'd probably use sed
:
sed -n -e 's%.*https://\([-.0-9A-Za-z]\{1,\}\.[A-Za-z]\{2,\}\).*%\1%p'
给出的数据文件:
Given the data file:
Nothing here
Before https://example.com after
https://example.com and after
Before you get to https://www.example.com
And double your https://example.com for fun and happiness https://www.example.com in triplicate https://a.bb
and nothing here
的 SED
脚本生成每行一个条目,显示当有一个以上的上线的最后一个条目:
The sed
script produces one entry per line, showing the last entry when there's more than one on the line:
example.com
example.com
www.example.com
a.bb
一个Perl脚本可用于每行的多个条目:
A Perl script can be used for multiple entries per line:
$ perl -nle 'print $1 while (m%https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})%g);' data
example.com
example.com
www.example.com
example.com
www.example.com
a.bb
$
这篇关于SED,AWK,Perl或法:找到preFIX +正则表达式字符串,忽略输入的其余部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!