Grep 包含特定关键字的文本的整个段落 [英] Grep whole paragraphs of a text containing a specific keyword

查看:22
本文介绍了Grep 包含特定关键字的文本的整个段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是提取包含特定关键字的文本段落.不仅仅是包含关键字的行,而是整个段落.强加于我的文本文件的规则是每个段落都以特定模式(例如 Pa0)开头,该模式仅在段落开头在整个文本中使用.每个段落以换行符结束.

My goal is to extract the paragraphs of a text that contain a specific keyword. Not just the lines that contain the keyword, but the whole paragraph. The rule imposed on my text files is that every paragraph starts with a certain pattern (e.g. Pa0) which is used throughout the text only in the start of the paragraph. Each paragraph ends with a new line character.

例如,假设我有以下文本:

For example, imagine I have the following text:

Pa0 
This is the first paragraph bla bla bla
This is another line in the same paragraph bla bla 
This is a third line bla bla 

Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla 
bla 

Pa0
Hey, third paragraph bla bla bla!
bla bla 

Pa0
keyword keyword
keyword
Another line! bla 

我的目标是提取这些包含关键字"一词的段落.例如:

My goal is to extract these paragraphs that contain the word "keyword". For example:

Pa0
This is the second paragraph bla bla bla
Second line bla bla My keyword is here!
bla bla bla 
bla 

Pa0
keyword keyword
keyword
Another line! bla 

我可以使用例如grep 用于关键字和 -A、-B 或 -C 选项以在关键字所在的行之前和/或之后获得恒定数量的行,但这似乎还不够,因为文本块的开头和结尾取决于分隔符Pa0"和 ".

I can use e.g. grep for the keyword and -A, -B or -C option to get a constant number of lines before and/or after the line where the keyword is located but this does not seem enough since the beginning and end of the text block depends on the delimiters "Pa0" and " ".

grep 或其他工具(例如 awk、sed、perl)的任何建议都会有所帮助.

Any suggestion for grep or another tool (e.g. awk, sed, perl) would be helpful.

推荐答案

awk 很简单:

awk '/keyword/' RS="

" ORS="

" input.txt

说明:

通常 awk 以每行为基础进行操作,因为记录分隔符 RS 的默认值是 (单个新行).通过将 RS 依次更改为两个新行(一个空行),我们可以轻松地在段落基础上进行操作.

Usually awk operates on a per line basis, because the default value of the record separator RS is (a single new line). By changing the RS to two new lines in sequence (an empty line) we can easily operate on a paragraph basis.

/keyword/ 是一个条件,一个正则表达式.由于在条件 awk 之后没有任何动作,如果它包含 keyword,将简单地打印未更改的记录(段落).

/keyword/ is a condition, a regex. Since there is no action after the condition awk will simply print the unchanged record (the paragraph) if it contains keyword.

将输出记录分隔符ORS设置为 将输出的段落用空行分隔,就像在输入中一样.

Setting the output record separator ORS to will separate the paragraphs of output with an empty line, just like in the input.

这篇关于Grep 包含特定关键字的文本的整个段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆