如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现 [英] How to extract multiple string occurences between two matching patterns using sed or grep commands
问题描述
我是Unix的新手,正在玩sed和awk命令. 我的样本snort规则多次出现关键字"content".我需要提取内容之间的所有数据:和";到一个文件.
I am newbie to unix and playing around with sed and awk commands. My sample snort rule has multiple occurrences of keyword "content". I need to extract all data between content:" and "; to a file.
此示例在一行中包含一个规则.我的实际文件中包含3万条此类规则.
This sample contains one rule in single line. My actual file contains 30k of such rules.
1个规则文件包含
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"APP-DETECT Absolute Software Computrace outbound connection - search.namequery.com"; flow:to_server,established; content:"Host|3A| search.namequery.com|0D 0A|"; fast_pattern:only; http_header; content:"TagId: "; http_header; metadata:policy security-ips drop, ruleset community, service http; reference:url,absolute.com/support/consumer/technology_computrace; reference:url,www.blackhat.com/presentations/bh-usa-09/ORTEGA/BHUSA09-Ortega-DeactivateRootkit-PAPER.pdf; classtype:misc-activity; sid:26287; rev:4;) cat 4rules|sed 's/.*content:"\([^";]*\)".*/\1/'sdfjklhaskl;jdf;kljasdfsjkdfhnkl;asdjfklasdfja'sjkdsdfh;askldjf`
预期输出:
Host|3A| search.namequery.com|0D 0A|
TagId
\([^
我尝试了sed和grep命令.
I tried my with sed and grep commands.
grep -Po '(?<=content:").*(?=";)' 1rule
sed 's/.*content:"\([^";]*\).*/\1/' 1rule
我得到的输出与预期不符:
The output I got is not as expected:
使用grep,我可以看到所有内容,但是它们之间存在中间数据 sed为我提供了行中的最后一次出现以及出现后的不匹配行.
Using grep, I could see all contents but there is intermediate data between them sed gives me the last occurrence in a line along with non matching lines after the occurrence.
请告诉我我该如何解决这个问题.
Please tell me know how can i solve this problem.
推荐答案
使用GNU grep
(在您的问题中,对于兼容Perl的正则表达式,请使用-P
选项):
With GNU grep
(as in your question, taking advantage of the -P
option for Perl-compatible regular expressions):
grep -Po 'content:"\K[^"]+' 1rule
-
\K
删除到目前为止已匹配的内容:字段标签和开头的"
. -
[^"]+
然后匹配字符串的内容,直到但不包括结尾的"
. \K
drops what's been matched so far: the field label and the opening"
.[^"]+
then matches the content of the string up to, but excluding, the closing"
.
或者,尝试以下操作awk
:
awk -F'content:' '{
for (i=2;i<=NF;++i) {
split($i, a, /"/); print a[2]
}
}' 1rule
- 通过分隔符
content:
将输入行拆分为字段
- 从索引2开始遍历文件(因为字段1是字符串在之前的第一个
content:
子字符串). - 通过
"
将字段拆分为令牌,并打印第二个令牌,第二个令牌是字段开头在"..."
中包含的字符串. - Splits the input line(s) into fields by separator
content:
- Loops over files starting with index 2 (because field 1 is the string preceding the first
content:
substring). - Splits the field into tokens by
"
and prints the 2nd token, which is the string enclosed in"..."
at the start of the field.
这篇关于如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!