将分割线与awk/gawk结合 [英] Combine split lines with awk / gawk

查看:132
本文介绍了将分割线与awk/gawk结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果行超过X个字符,则系统会将行包装在日志文件中.我试图从日志中提取各种数据,但是首先我需要结合所有分割线,以便gawk可以将字段解析为一条记录.

A system wraps lines in a log file if they exceed X characters. I am trying to extract various data from the log, but first I need to combine all the split lines so gawk can parse the fields as a single record.

例如:

2012/11/01 field1 field2 field3 field4 fi
eld5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 fi
eld5 field6 field7 field8 field9 field10 
field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4

我想回来

2012/11/01 field1 field2 field3 field4 field5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4

在我的情况下,实际的最大行长为130.我不愿意测试该长度,并使用getline加入下一行,以防输入的行正好是130个字符.

The actual max line length in my case is 130. I'm reluctant to test for that length and use getline to join the next line, in case there is a entry that is exactly 130 chars long.

一旦我清理了日志文件,我还将要提取所有相关事件,其中相关"可能涉及类似以下条件:

Once I've cleaned up the log file, I'm also going to want to extract all the relevant events, where "relevant" may involve criteria like:

  • 'foo'在记录中任何字段的任何地方
  • field2 〜/bar | dtn/
  • 如果 field1 〜/xyz | abc/&& field98 =="0001"
  • 'foo' is anywhere in any field in the record
  • field2 ~ /bar|dtn/
  • if field1 ~ /xyz|abc/ && field98 == "0001"

我想知道我是否需要运行两个连续的gawk程序,或者是否可以将所有这些合并为一个.

I'm wondering if I will need to run two successive gawk programs, or if I can combine all of this into one.

我是gawk的新手,来自非Unix

I'm a gawk newbie and come from a non-Unix

推荐答案

$ awk '{printf "%s%s",($1 ~ "/" ? rs : ""),$0; rs=RS} END{print ""}' file
2012/11/01 field1 field2 field3 field4 field5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11  field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4

现在我注意到您实际上并不想只打印重新组合的记录,这是另一种方法,它更适合在重新编译的记录上进行测试(此脚本中的"s":

Now that I've noticed you don't actually want to just print recombined records, here's an alternative way to do that that's more amenable to test on the recompiled record ("s" in this script:

$ awk 'NR>1 && $1~"/"{print s; s=""} {s=s $0} END{print s}' file

现在有了这种结构,例如,您不仅可以打印s,还可以对s进行测试(在第三条记录中请注意"foo"):

Now with that structure, instead of just printing s you can perform tests on s, for example (note "foo" in 3rd record):

$ cat file
2012/11/01 field1 field2 field3 field4 fi
eld5 field6 field7
2012/11/03 field1 field2 field3
2012/12/31 field1 field2 foo field4 fi
eld5 field6 field7 field8 field9 field10
field11 field12 field13
2013/01/10 field1 field2 field3
2013/01/11 field1 field2 field3 field4

$ awk '
function tst(rec,     flds,nf,i) {
   nf=split(rec,flds)
   if (rec ~ "foo") {
      print rec
      for (i=1;i<=nf;i++)
         print "\t",i,flds[i]
   }
}
NR>1 && $1~"/" { tst(s); s="" }
{ s=s $0 }
END { tst(s) }
' file
2012/12/31 field1 field2 foo field4 field5 field6 field7 field8 field9 field10 field11 field12 field13
         1 2012/12/31
         2 field1
         3 field2
         4 foo
         5 field4
         6 field5
         7 field6
         8 field7
         9 field8
         10 field9
         11 field10
         12 field11
         13 field12
         14 field13

这篇关于将分割线与awk/gawk结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆