如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现 [英] How to extract multiple string occurences between two matching patterns using sed or grep commands

查看：828 发布时间：2020/11/12 21:44:59 unix awk sed grep

本文介绍了如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Unix的新手，正在玩sed和awk命令. 我的样本snort规则多次出现关键字"content".我需要提取内容之间的所有数据:和";到一个文件.

I am newbie to unix and playing around with sed and awk commands. My sample snort rule has multiple occurrences of keyword "content". I need to extract all data between content:" and "; to a file.

此示例在一行中包含一个规则.我的实际文件中包含3万条此类规则.

This sample contains one rule in single line. My actual file contains 30k of such rules.

1个规则文件包含

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"APP-DETECT Absolute Software Computrace outbound connection - search.namequery.com"; flow:to_server,established; content:"Host|3A| search.namequery.com|0D 0A|"; fast_pattern:only; http_header; content:"TagId: "; http_header; metadata:policy security-ips drop, ruleset community, service http; reference:url,absolute.com/support/consumer/technology_computrace; reference:url,www.blackhat.com/presentations/bh-usa-09/ORTEGA/BHUSA09-Ortega-DeactivateRootkit-PAPER.pdf; classtype:misc-activity; sid:26287; rev:4;) cat 4rules|sed 's/.*content:"\([^";]*\)".*/\1/'sdfjklhaskl;jdf;kljasdfsjkdfhnkl;asdjfklasdfja'sjkdsdfh;askldjf`

预期输出:

Host|3A| search.namequery.com|0D 0A|

TagId

\([^

我尝试了sed和grep命令.

I tried my with sed and grep commands.

grep -Po '(?<=content:").*(?=";)' 1rule
sed  's/.*content:"\([^";]*\).*/\1/' 1rule

我得到的输出与预期不符:

The output I got is not as expected:

使用grep，我可以看到所有内容，但是它们之间存在中间数据 sed为我提供了行中的最后一次出现以及出现后的不匹配行.

Using grep, I could see all contents but there is intermediate data between them sed gives me the last occurrence in a line along with non matching lines after the occurrence.

请告诉我我该如何解决这个问题.

Please tell me know how can i solve this problem.

推荐答案

使用GNU grep(在您的问题中，对于兼容Perl的正则表达式，请使用-P选项):

With GNU grep (as in your question, taking advantage of the -P option for Perl-compatible regular expressions):

grep -Po 'content:"\K[^"]+' 1rule

\K删除到目前为止已匹配的内容:字段标签和开头的".
[^"]+然后匹配字符串的内容，直到但不包括结尾的".

\K drops what's been matched so far: the field label and the opening ".
[^"]+ then matches the content of the string up to, but excluding, the closing ".

或者，尝试以下操作awk:

awk -F'content:' '{ 
    for (i=2;i<=NF;++i) {
      split($i, a, /"/); print a[2]
    }
  }' 1rule

通过分隔符content:
从索引2开始遍历文件(因为字段1是字符串在之前的第一个content:子字符串).
通过"将字段拆分为令牌，并打印第二个令牌，第二个令牌是字段开头在"..."中包含的字符串.

Splits the input line(s) into fields by separator content:
Loops over files starting with index 2 (because field 1 is the string preceding the first content: substring).
Splits the field into tokens by " and prints the 2nd token, which is the string enclosed in "..." at the start of the field.

这篇关于如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现 [英] How to extract multiple string occurences between two matching patterns using sed or grep commands

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何使用sed或grep命令提取两个匹配模式之间的多个字符串出现 [英] How to extract multiple string occurences between two matching patterns using sed or grep commands

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭