如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现 [英] How to extract multiple string occurences between two matching patterns using sed or grep commands

查看:828
本文介绍了如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Unix的新手,正在玩sed和awk命令. 我的样本snort规则多次出现关键字"content".我需要提取内容之间的所有数据:和";到一个文件.

I am newbie to unix and playing around with sed and awk commands. My sample snort rule has multiple occurrences of keyword "content". I need to extract all data between content:" and "; to a file.

此示例在一行中包含一个规则.我的实际文件中包含3万条此类规则.

This sample contains one rule in single line. My actual file contains 30k of such rules.

1个规则文件包含

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"APP-DETECT Absolute Software Computrace outbound connection - search.namequery.com"; flow:to_server,established; content:"Host|3A| search.namequery.com|0D 0A|"; fast_pattern:only; http_header; content:"TagId: "; http_header; metadata:policy security-ips drop, ruleset community, service http; reference:url,absolute.com/support/consumer/technology_computrace; reference:url,www.blackhat.com/presentations/bh-usa-09/ORTEGA/BHUSA09-Ortega-DeactivateRootkit-PAPER.pdf; classtype:misc-activity; sid:26287; rev:4;) cat 4rules|sed 's/.*content:"\([^";]*\)".*/\1/'sdfjklhaskl;jdf;kljasdfsjkdfhnkl;asdjfklasdfja'sjkdsdfh;askldjf`

预期输出:

Host|3A| search.namequery.com|0D 0A|

TagId

\([^

我尝试了sed和grep命令.

I tried my with sed and grep commands.

grep -Po '(?<=content:").*(?=";)' 1rule
sed  's/.*content:"\([^";]*\).*/\1/' 1rule

我得到的输出与预期不符:

The output I got is not as expected:

使用grep,我可以看到所有内容,但是它们之间存在中间数据 sed为我提供了行中的最后一次出现以及出现后的不匹配行.

Using grep, I could see all contents but there is intermediate data between them sed gives me the last occurrence in a line along with non matching lines after the occurrence.

请告诉我我该如何解决这个问题.

Please tell me know how can i solve this problem.

推荐答案

使用GNU grep(在您的问题中,对于兼容Perl的正则表达式,请使用-P选项):

With GNU grep (as in your question, taking advantage of the -P option for Perl-compatible regular expressions):

grep -Po 'content:"\K[^"]+' 1rule

  • \K删除到目前为止已匹配的内容:字段标签和开头的".
  • [^"]+然后匹配字符串的内容,直到但不包括结尾的".
    • \K drops what's been matched so far: the field label and the opening ".
    • [^"]+ then matches the content of the string up to, but excluding, the closing ".
    • 或者,尝试以下操作awk:

      awk -F'content:' '{ 
          for (i=2;i<=NF;++i) {
            split($i, a, /"/); print a[2]
          }
        }' 1rule
      

      • 通过分隔符content:
      • 将输入行拆分为字段
      • 从索引2开始遍历文件(因为字段1是字符串之前的第一个content:子字符串).
      • 通过"将字段拆分为令牌,并打印第二个令牌,第二个令牌是字段开头在"..."中包含的字符串.
        • Splits the input line(s) into fields by separator content:
        • Loops over files starting with index 2 (because field 1 is the string preceding the first content: substring).
        • Splits the field into tokens by " and prints the 2nd token, which is the string enclosed in "..." at the start of the field.
        • 这篇关于如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆