awk排序多维数组 [英] awk sort multidimensional array

查看:70
本文介绍了awk排序多维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GNU awk支持多维数组:

GNU awk supports multidimensional arrays:

q[1][1] = "dog"
q[1][2] = 999
q[2][1] = "mouse"
q[2][2] = 777
q[3][1] = "bird"
q[3][2] = 888

我想对 q 的第二列"进行排序,以便留下:

I would like to sort the "second column" of q such that I am left with:

q[1][1] = "mouse"
q[1][2] = 777
q[2][1] = "bird"
q[2][2] = 888
q[3][1] = "dog"
q[3][2] = 999

如您所见,第一列"值已移动,与第二个保持一致.我懂了GNU Awk提供了分类功能,但它似乎不支持多维数组.如果有帮助,这是一个工作的Ruby示例:

as you can see the "first column" values moved to keep with the second. I see GNU Awk offers an asort function but it does not appear to support multidimensional arrays. If it helps, this is a working Ruby example:

q = [["dog", 999], ["mouse", 777], ["bird", 888]]
q.sort_by{|z|z[1]}
=> [["mouse", 777], ["bird", 888], ["dog", 999]]

我最终使用了常规数组,然后用换行符分隔重复项:

I ended up using a regular array, then separating duplicates with newlines:

q[777] = "mouse"
q[999] = "dog" RS "fish"
q[888] = "bird"
for (z in q) {
  print q[z]
}

推荐答案

FWIW,这是一种解决方法"sort_by()"函数:

FWIW, here's a workaround "sort_by()" function:

$ cat tst.awk
BEGIN {
    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

    print "\n############################\nBefore:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

    sort_by(a,2)

    print "\n############################\nAfter:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

}

function sort_by(arr,key,       keys,vals,i,j)
{
    for (i=1; i in arr; i++) {
        keys[i] = arr[i][key]
        for (j=1; j in arr[i]; j++)
            vals[keys[i]] = vals[keys[i]] (j==1?"":SUBSEP) arr[i][j]
    }

    asort(keys)

    for (i=1; i in keys; i++)
       split(vals[keys[i]],arr[i],SUBSEP)

    return (i - 1)
}

$ gawk -f tst.awk

############################
Before:
a[1][1] = dog
a[1][2] = 999
a[2][1] = mouse
a[2][2] = 777
a[3][1] = bird
a[3][2] = 888
############################

############################
After:
a[1][1] = mouse
a[1][2] = 777
a[2][1] = bird
a[2][2] = 888
a[3][1] = dog
a[3][2] = 999
############################

它首先转换为这样:

    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

对此:

    keys[1]   = 999
    vals[999] = dog SUBSEP 999

    keys[2]   = 777
    vals[777] = mouse SUBSEP 777

    keys[3]   = 888
    vals[888] = bird SUBSEP 888

然后asort()ing keys []获得:

then asort()ing keys[] to get:

    keys[1] = 777
    keys[2] = 888
    keys[3] = 999

,然后使用其元素作为vals数组的索引遍历keys数组,以重新填充原始数组.

and then looping through the keys array using it's elements as the indices to the vals array for re-populating the original array.

万一有人想知道为什么我不只是使用我们要排序的值作为索引,然后执行asorti()那样会导致代码更简短的原因,这就是为什么:

In case anyone's wondering why I didn't just use the values we want to sort on as indices and then do an asorti() as that would have resulted in slightly briefer code, here's why:

$ cat tst.awk
BEGIN {
   a[1] = 888
   a[2] = 9
   a[3] = 777

   b[888]
   b[9]
   b[777]

   print "\n\"a[]\" sorted by content:"
   asort(a,A)
   for (i=1; i in A; i++)
      print "\t" A[i]

   print "\n\"b[]\" sorted by index:"
   asorti(b,B)
   for (i=1; i in B; i++)
      print "\t" B[i]

}
$ awk -f tst.awk

"a[]" sorted by content:
        9
        777
        888

"b[]" sorted by index:
        777
        888
        9

请注意,asorti()将"9"视为比"888"更高的值.这是因为asorti()对数组索引进行排序,并且所有数组索引都是字符串(即使它们看起来像数字),并且按字母顺序,字符串"9"的第一个字符比字符串"888"的第一个字符高.另一方面,asort()对数组的内容进行排序,并且数组的内容可以是字符串或数字,因此适用常规的awk比较规则-看起来像数字的任何事物都被视为数字,并且数字9小于数字888,在这种情况下,恕我直言是理想的结果.

Notice that asorti() treats "9" as a higher value than "888". That's because asorti() sorts on array indices and all array indices are strings (even if they look like numbers) and alphabetically the first character of the string "9" IS higher than the first character of the string "888". asort() on the other hand sorts on the contents of the array, and array contents can be strings OR numbers and so normal awk comparison rules apply - anything that looks like a number is treated like a number and the number 9 is less than the number 888 which in this case IMHO is the desired result.

这篇关于awk排序多维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆