如何提取多线图案之间的线? [英] How to extract lines between multiline patterns?

查看:63
本文介绍了如何提取多线图案之间的线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的文件

I have a file which looks like:

  blah blah blah blah blah blah blah blah 
  blah blah blah blah blah blah blah blah 
  blah blah blah blah blah blah blah blah 
<empty line here>
     Total DOS and NOS and partial (IT) DOSDOWN   
<empty line here>
     E     Total     1
<empty line here>
-1.5000    0.004    0.000    0.004
-1.4953    0.004    0.000    0.004
-1.4906    0.004    0.000    0.004
-1.4859    0.004    0.000    0.004
-1.4812    0.004    0.000    0.004
 0.3563    0.708    5.510    0.708
 0.3609    0.562    5.513    0.562
 0.3656    0.381    5.515    0.381
 0.3703    0.149    5.517    0.149
<empty line here>
     Sublattice  1 Atom Fe   spin DOWN   

我想要的是提取(第一个图案)之间的所有行

What I want is to extract all lines between (first pattern)

     Total DOS and NOS and partial (IT) DOSUP     
<empty line here>    
     E     Total     1
<empty line here>

和(第二种模式)

<empty line here>
     Sublattice  1 Atom Fe   spin DOWN   

即我想得到

-1.5000    0.004    0.000    0.004
-1.4953    0.004    0.000    0.004
-1.4906    0.004    0.000    0.004
-1.4859    0.004    0.000    0.004
-1.4812    0.004    0.000    0.004
 0.3563    0.708    5.510    0.708
 0.3609    0.562    5.513    0.562
 0.3656    0.381    5.515    0.381
 0.3703    0.149    5.517    0.149

因此,到最后,我想在两个多行模式之间插入行. 据我了解,awk可以通过状态机检测多行模式(请参见此处),但在我的情况下我做不到.

So, at the end of the day I want to have lines between two multiline patterns. As I understand awk can detect multiline patterns via state machine (see here), but I failed to do it in my case.

任何解决该问题的建议将不胜感激.

Any suggestion how to resolve this problem would be very much appreciated.

推荐答案

这是基于Ed Morton技巧的解决方案.

Here's a solution based on Ed Morton's trick.

awk -v RS= 'n==2; /Total DOS/ || n {n++;next} {n=0}' input.txt

这是它的工作原理.

  • RS=将awk置于多行模式,以便记录包含行块.
  • n==2;打印在满足此条件时处理的所有记录.
  • /RE/ || n是一个条件,如果在当前记录中RE(模式)匹配,或者变量n不为零,则结果为true.
  • {n++;next}显然会增加n并跳至下一条记录.
  • {n=0}如果还没有跳到下一条记录,我们将重置n.
  • RS= puts awk into multi-line mode, so that records contain blocks of lines.
  • n==2; prints any record processed while this condition is met.
  • /RE/ || n is a condition that evaluates to true if EITHER the RE (pattern) is matched within the current record or the variable n is non-zero.
  • {n++;next} obviously increments n and skips to the next record.
  • {n=0} And if we haven't already skipped to the next record, we reset n.

所有这些的效果是,我们打印出一条记录,该记录是在具有匹配模式的记录之后的两个记录.当然,您可以将启动计数器的条件调整为所需的值.例如$2=="Total".加盐调味.

The effect of all this is that we print the record that is two records after the one with the matched pattern. You could of course adjust the condition that begins the counter to whatever you like. $2=="Total" for example. Salt to taste.

sh-3.2$ cat input.txt
  blah blah blah blah blah blah blah blah
  blah blah blah blah blah blah blah blah
  blah blah blah blah blah blah blah blah

     Total DOS and NOS and partial (IT) DOSUP

     E     Total     1

  -1.5000    0.004    0.000    0.004
  -1.4953    0.004    0.000    0.004
  -1.4906    0.004    0.000    0.004
  .......    .....    .....    .....
   0.3609    0.562    5.513    0.562
   0.3656    0.381    5.515    0.381
   0.3703    0.149    5.517    0.149

   blah      blah     blah     blah

sh-3.2$ awk -v RS=  'n==2; /Total DOS and NOS/||n{n++;next} {n=0}' input.txt
  -1.5000    0.004    0.000    0.004
  -1.4953    0.004    0.000    0.004
  -1.4906    0.004    0.000    0.004
  .......    .....    .....    .....
   0.3609    0.562    5.513    0.562
   0.3656    0.381    5.515    0.381
   0.3703    0.149    5.517    0.149

这篇关于如何提取多线图案之间的线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆