SED,AWK,Perl或法:找到preFIX +正则表达式字符串,忽略输入的其余部分 [英] sed, awk, perl or lex: find strings by prefix+regex, ignoring rest of input

查看:105
本文介绍了SED,AWK,Perl或法:找到preFIX +正则表达式字符串,忽略输入的其余部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到具有一定的preFIX字符串,其次是正则表达式,在一堆文件,但忽略输入的其余部分(包括preFIX前行的内容,并经过匹配的正则表达式的结束)。

I need to find strings with a certain prefix, followed by a regexp, in a bunch of files, but ignore the rest of the input (including the content of the line before the prefix, and after the end of the matching regexp).

什么是这个职位的最佳工具? 的grep 找到完整的生产线; SED 通常只用于编辑和选择和替换; AWK perl的

What's the best tool for the job? grep finds complete lines; sed is usually used just for editing and select-and-replace; awk? perl?

我也想过,但我真后,编译器编译?!

I also thought about lex, but am I really after a compiler compiler?!

编辑:输入是HTML文件几千,在preFIX +正规前pression将 https://开头([-.0-9A-ZA-Z] + \\ {2})(其中我想 $ 1 ),输入的其余部分被忽略[A-ZA-Z]

the input is several thousand of HTML files, the prefix + regular expression would be https://([-.0-9A-Za-z]+\.[A-Za-z]{2,}) (of which I want $1), and the rest of the input ignored.

推荐答案

如果你不会有图案的多个在同一行,我可能会使用 SED

If you won't have more than one of the pattern on a single line, I'd probably use sed:

sed -n -e 's%.*https://\([-.0-9A-Za-z]\{1,\}\.[A-Za-z]\{2,\}\).*%\1%p'

给出的数据文件:

Given the data file:

Nothing here
Before https://example.com after
https://example.com and after
Before you get to https://www.example.com
And double your https://example.com for fun and happiness https://www.example.com in triplicate https://a.bb
and nothing here

SED 脚本生成每行一个条目,显示当有一个以上的上线的最后一个条目:

The sed script produces one entry per line, showing the last entry when there's more than one on the line:

example.com
example.com
www.example.com
a.bb

一个Perl脚本可用于每行的多个条目:

A Perl script can be used for multiple entries per line:

$ perl -nle 'print $1 while (m%https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})%g);' data
example.com
example.com
www.example.com
example.com
www.example.com
a.bb
$

这篇关于SED,AWK,Perl或法:找到preFIX +正则表达式字符串,忽略输入的其余部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆