SED单行 - 符查找周边对关键字 [英] sed one-liner - Find delimiter pair surrounding keyword
问题描述
我一般大的XML文件的工作,一般通过的grep
做字数以确认某些统计数据。
I typically work with large XML files, and generally do word counts via grep
to confirm certain statistics.
例如,我要确保我通过在一个XML文件小部件
中至少有五个实例:
For example, I want to make sure I have at least five instances of widget
in a single xml file via:
cat test.xml | grep -ic widget
此外,我只是想能够登录了行小部件
上出现,例如:
Additionally, I just like to be able to log the line that widget
appears on, ie:
cat test.xml | grep -i widget > ~/log.txt
不过,我真正需要的关键信息是XML code那块部件
出现在示例文件可能看起来像:
However, the key information I really need is the block of XML code that widget
appears in. An example file may look like:
<test> blah blah
blah blah blah
widget
blah blah blah
</test>
<formula>
blah
<details>
widget
</details>
</formula>
我试图让从以上示例文本下面的输出,例如:
I am trying to get the following output from the sample text above, ie:
<test>widget</test>
<formula>widget</formula>
实际上,我试图让一切适用于XML文本/ code是围绕任意字符串块标记标签的最高级别的单行线,小部件
。
有没有人有通过命令行实现这一个班轮有什么建议?
Does anyone have any suggestions for implementing this via a command-line one liner?
感谢您。
推荐答案
使用非优雅的方式既 SED
和 AWK
:
A non-elegant way using both sed
and awk
:
sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}' file.txt | awk 'NR%2==1 { sub(/^[ \t]+/, ""); search = $0 } NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end }'
结果:
<test>widget</test>
<formula>widget</formula>
说明:
## The sed pipe:
sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}'
## This finds the widget pattern, ignoring case, then finds the last,
## highest level markup tag (these must match the start of the line)
## Ultimately, this prints two lines for each pattern match
## Now the awk pipe:
NR%2==1 { sub(/^[ \t]+/, ""); search = $0 }
## This takes the first line (the widget pattern) and removes leading
## whitespace, saving the pattern in 'search'
NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end }
## This finds the next line (which is even), and stores the markup tag in 'end'
## We then remove the slash from this tag and print it, the widget pattern, and
## the saved markup tag
心连心
这篇关于SED单行 - 符查找周边对关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!