bash命令按计数分组 [英] bash command for group by count

查看:168
本文介绍了bash命令按计数分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以下格式的文件

I have a file in the following format

abc|1
def|2
abc|8
def|3
abc|5
xyz|3

我需要在第一列中对这些单词进行分组,并对第二列的值求和.例如,此文件的输出应为

I need to group by these words in the first column and sum the value of the second column. For instance, the output of this file should be

abc|14
def|5
xyz|3

说明:单词"abc"的相应值为1、8和5.通过将这些数字相加,总和为14,输出变为"abc | 14".类似地,对于单词"def",对应的值为2和3.将它们加总后,最终输出为"def | 5".

Explanation: the corresponding values for word "abc" are 1, 8, and 5. By adding these numbers, the sum comes out to be 14 and the output becomes "abc|14". Similarly, for word "def", the corresponding values are 2 and 3. Summing up these, the final output comes out to be "def|5".

非常感谢您的帮助:)

Thank you very much for the help :)

我尝试了以下命令

awk -F "|" '{arr[$1]+=$2} END {for (i in arr) {print i"|"arr[i]}}' filename

我发现的另一个命令是

awk -F "," 'BEGIN { FS=OFS=SUBSEP=","}{arr[$1]+=$2 }END {for (i in arr) print i,arr[i]}' filename

两个都没有给我看预期的结果.尽管我也对这些命令的工作方式也有疑问.

Both didn't show me the intended results. Although I'm also in doubt of the working of these commands as well.

推荐答案

我将添加一个答案来解决您遇到的排序问题,在您的Awk逻辑中,您无需使用sort/uniq通过管道传递到Awk的输出,但在Awk本身中进行处理.

I will just add an answer to fix the sorting issue you had, in your Awk logic, you don't need to use sort/uniq piped to the output of Awk, but process in Awk itself.

请参阅 GNU Awk使用预定义的阵列扫描顺序使用gawk ,您可以使用PROCINFO["sorted_in"]变量(特定于gawk)来控制希望Awk对最终输出进行排序的方式.

Referring to GNU Awk Using Predefined Array Scanning Orders with gawk, you can use the PROCINFO["sorted_in"] variable(gawk specific) to control how you want Awk to sort your final output.

请参阅下面的部分

@ind_str_asc 按索引升序排序,与字符串相比;这是最基本的排序. (在内部,数组索引始终是字符串,因此对于a[2*5] = 1,索引是10,而不是数字10.)

@ind_str_asc Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with a[2*5] = 1 the index is 10 rather than numeric 10.)

因此您可以在END子句中的要求中使用它,

So using this in your requirement in the END clause just do,

END{PROCINFO["sorted_in"]="@ind_str_asc"; for (i in unique) print i,unique[i]}

使用完整的命令

awk '
    BEGIN{FS=OFS="|"}{
        unique[$1]+=$2; 
        next
    }
    END{
        PROCINFO["sorted_in"]="@ind_str_asc"; 
        for (i in unique) 
            print i,unique[i]
    }' file

这篇关于bash命令按计数分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆