合并除关键字段外所有相同的行,并使关键字段成为范围 [英] Merge all lines that are identical aside from a key field and make key field a range

查看:69
本文介绍了合并除关键字段外所有相同的行,并使关键字段成为范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在看很多帖子,但是还没有完全找到我想要的东西.我不确定如何获取以下示例数据:

I've been looking at a lot of posts and haven't quite found what I'm looking for. I'm not sure how to go about taking the following sample data:

host1   input   nic1    ip1 ip2 PROT    30000   10
host1   input   nic1    ip1 ip2 PROT    40000   10
host1   input   nic1    ip1 ip2 PROT    50000   10
host1   input   nic1    ip1 ip2 PROT    60000   10
host1   input   nic1    ip3 ip2 PROT    10      30000
host1   input   nic1    ip3 ip2 PROT    10      40000
host1   input   nic1    ip3 ip2 PROT    10      50000
host1   input   nic1    ip3 ip2 PROT    10      60000
host1   output  nic1    ip2 ip1 PROT    10      30000
host1   output  nic1    ip2 ip1 PROT    10      40000
host1   output  nic1    ip2 ip1 PROT    10      50000
host1   output  nic1    ip2 ip1 PROT    10      60000
host1   output  nic1    ip2 ip3 PROT    30000   10
host1   output  nic1    ip2 ip3 PROT    40000   10
host1   output  nic1    ip2 ip3 PROT    50000   10
host1   output  nic1    ip2 ip3 PROT    60000   10
host1   output  loc     ip2 ip2 PROT    10      30000
host1   output  loc     ip2 ip2 PROT    10      50000

并将其合并为:

host1   input   nic1    ip1 ip2 PROT    30000:60000 10
host1   input   nic1    ip3 ip2 PROT    10          30000:60000
host1   output  nic1    ip2 ip1 PROT    10          30000:60000
host1   output  nic1    ip2 ip3 PROT    30000:60000 10
host1   output  loc     ip2 ip2 PROT    10          30000:50000

我有大量这样的数据,需要确定给定行的多个字段的范围,但是我认为,如果有人可以像我上面那样向我展示如何针对一个字段进行操作,我应该能够找出其余的.如果没有的话,我会跟进:).预先感谢您的帮助.

I have a large amount of data like this with the need to make ranges for multiple fields of a given line but I think if somebody can show me how to do it for one field as I have above, I should be able to figure the rest out. And if not I'll follow up :). Thanks in advance for any help.

推荐答案

更新

我已经重构了以下答案中的代码,以使其更具可读性.主体应该阅读几乎是英文散文.

Update

I have refactored the code in the answer below so as to make it more readable. The main body should read almost English prose.

#!/usr/bin/awk -f
# main body
NR == 1 {
  copyRecordTo(veryold)
  next
}
{
  if (inSameGroup()) {
    copyRecordTo(old)
  } else {
    makeRangeForField(NF - 1)
    makeRangeForField(NF)
    nicePrint()
    copyRecordTo(veryold)
  }
}
END {
  makeRangeForField(NF - 1)
  makeRangeForField(NF)
  nicePrint()
}

# functions
function copyRecordTo(line) {
  for (i = 1; i <= NF; ++i) line[i] = $i
}
function nicePrint() {
  for (i = 1; i <= NF; ++i) {
    i == NF - 1 ? fmt = "%s\t\t" : fmt = "%s\t"
    printf(fmt, old[i])
  }
  printf("\n")
}
function makeRangeForField(f) {
  if (old[f] != veryold[f])
    old[f] = veryold[f]":"old[f]
}
function inSameGroup() {
  b = 1
  for (i = 1; i <= NF - 2; ++i)
    b *= $i == veryold[i]
  return b == 1
}

原始答案

以下awk脚本几乎生成了您要查找的内容.

Original answer

The following awk script generates almost what you are looking for.

基本上,脚本会执行以下操作:

Essentially the script does the following:

  • 将仅在第7和/或第8字段中不同的每一行行的第一行存储在veryold
  • 将最后一个读取行存储在old
  • "boolean" b用于检查何时超过了最后一行
  • 发生这种情况时,veryold的最后两个字段与old的最后两个字段之间用:连接在一起,如果它们之间不同,并且会打印old
  • 在最后两个字段之间使用了另一个标签\t,以提高可读性
  • stores in veryold the first line of each set of lines that differ only for the 7th and/or 8th filed
  • stores in old the last read line
  • the "boolean" b is used to check when that last line is surpassed
  • when this happens the last two fields of veryold are joined with those of old with a : in between if they are different, and old is printed
  • one more tab \t is used between the last two fields to improve readability

其他两点:

  • NR == 1是一种特殊情况,只需要初始化veryold
  • 读取最后一行后,END处理存储在old中的最后一行的特殊情况
  • NR == 1 is a special case that has to initialize veryold only
  • after the last line is read END handles the special case of the last line stored in old
#!/usr/bin/awk -f
NR == 1 {
  for (i = 2; i <= NF; ++i) {
    veryold[i] = $i
  }
  next
}
{
  b = 1
  for (i = 2; i <= NF - 2; ++i) {
    b *= $i == veryold[i]
  }
  if (b == 1) {
    for (i = 1; i <= NF; ++i) {
      old[i] = $i
    }
  } else {
    if (old[NF - 1] != veryold[NF - 1]) {
      old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
    }
    if (old[NF] != veryold[NF]) {
      old[NF] = veryold[NF]":"old[NF]
    }
    for (i = 1; i <= NF; ++i) {
      if (i == NF - 1) {
        fmt = "%s\t\t"
      } else {
        fmt = "%s\t"
      }
      printf(fmt, old[i])
    }
    printf("\n")
    for (i = 2; i <= NF; ++i) {
      veryold[i] = $i
    }
  }
}
END {
  if (old[NF - 1] != veryold[NF - 1]) {
    old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
  }
  if (old[NF] != veryold[NF]) {
    old[NF] = veryold[NF]":"old[NF]
  }
  for (i = 1; i <= NF; ++i) {
    if (i == NF - 1) {
      fmt = "%s\t\t"
    } else {
      fmt = "%s\t"
    }
    printf(fmt, old[i])
  }
}

这篇关于合并除关键字段外所有相同的行,并使关键字段成为范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆