SED / AWK - 模式之间的打印文本多行跨区 [英] sed/awk - print text between patterns spanned across multiple lines
问题描述
我是新来的脚本,并试图了解如何提取这两种不同的模式之间存在的任何文本。不过,我仍然无法弄清楚如何在两个模式之间提取文本中的以下情形:
I am new to scripting and was trying to learn how to extract any text that exists between two different patterns. However, I am still not able to figure out how to extract text between two patterns in the following scenario:
如果我有我的输入文件阅读:
Hi I would like
to print text
between these
patterns
和我预期的输出是这样的:
I would like
to print text
between these
即。我的第一个搜索模式是你好,并跳过此模式,但打印以下匹配模式存在于同一行的一切。我的第二个搜索模式是模式,我想完全避免打印此行或超过任何行
i.e. my first search pattern is "Hi' and skip this pattern, but print everything that exists in the same line following that matched pattern. My second search pattern is "patterns" and I would like to completely avoid printing this line or any lines beyond that.
我试过如下:
sed -n '/Hi/,/patterns/p' test.txt
[输出]
Hi I would like
to print text
between these
patterns
接下来,我想:
`awk ' /'"Hi"'/ {flag=1;next} /'"pattern"'/{flag=0} flag { print }'` test.txt
[输出]
to print text
between these
有人可以帮助我确定如何实现这一目标?
在此先感谢
Can someone help me out in identifying how to achieve this? Thanks in advance
推荐答案
您有正确的想法,迷你状态机在 AWK
,但你需要一些轻微的mods按照下面的成绩单:
You have the right idea, a mini-state-machine in awk
but you need some slight mods as per the following transcript:
pax> echo 'Hi I would like
to print text
between these
patterns ' | awk '
/patterns/ { echo = 0 }
/Hi / { gsub("^.*Hi ", "", $0); echo = 1 }
{ if (echo == 1) { print } }'
或者,在COM pressed形式:
Or, in compressed form:
awk '/patterns/{e=0}/Hi /{gsub("^.*Hi ","",$0);e=1}{if(e==1){print}}'
,它的输出是:
I would like
to print text
between these
的要求。
这工作方式如下。在回声
变量最初是 0
这意味着没有呼应会发生。
The way this works is as follows. The echo
variable is initially 0
meaning that no echoing will take place.
每个行依次检查。如果它包含模式
,呼应是禁用的。
Each line is checked in turn. If it contains patterns
, echoing is disabled.
如果它包含你好
后面加一个空格,呼应开启的和的 GSUB
用于修改该行摆脱一切到你好
。
If it contains Hi
followed by a space, echoing is turned on and gsub
is used to modify the line to get rid of everything up to the Hi
.
然后,不管了,行了(可能修改)是呼应当回声
标志上。
Then, regardless, the line (possibly modified) is echoed when the echo
flag is on.
现在,那里将是边缘情况,如:
Now, there's going to be edge cases such as:
- 包含
你好
的两次出现线;或 - 包含前的东西的的
图案线条
。
- lines containing two occurrences of
Hi
; or - lines containing something before the
patterns
.
您还没有指定他们应该如何处理,所以我并没有理会,但基本的概念应该是相同的。
You haven't specified how they should be handled so I didn't bother, but the basic concept should be the same.
这篇关于SED / AWK - 模式之间的打印文本多行跨区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!