bash命令按计数分组 [英] bash command for group by count
问题描述
我有一个以下格式的文件
I have a file in the following format
abc|1
def|2
abc|8
def|3
abc|5
xyz|3
我需要在第一列中对这些单词进行分组,并对第二列的值求和.例如,此文件的输出应为
I need to group by these words in the first column and sum the value of the second column. For instance, the output of this file should be
abc|14
def|5
xyz|3
说明:单词"abc"的相应值为1、8和5.通过将这些数字相加,总和为14,输出变为"abc | 14".类似地,对于单词"def",对应的值为2和3.将它们加总后,最终输出为"def | 5".
Explanation: the corresponding values for word "abc" are 1, 8, and 5. By adding these numbers, the sum comes out to be 14 and the output becomes "abc|14". Similarly, for word "def", the corresponding values are 2 and 3. Summing up these, the final output comes out to be "def|5".
非常感谢您的帮助:)
Thank you very much for the help :)
我尝试了以下命令
awk -F "|" '{arr[$1]+=$2} END {for (i in arr) {print i"|"arr[i]}}' filename
我发现的另一个命令是
awk -F "," 'BEGIN { FS=OFS=SUBSEP=","}{arr[$1]+=$2 }END {for (i in arr) print i,arr[i]}' filename
两个都没有给我看预期的结果.尽管我也对这些命令的工作方式也有疑问.
Both didn't show me the intended results. Although I'm also in doubt of the working of these commands as well.
推荐答案
我将添加一个答案来解决您遇到的排序问题,在您的Awk
逻辑中,您无需使用sort
/uniq
通过管道传递到Awk
的输出,但在Awk
本身中进行处理.
I will just add an answer to fix the sorting issue you had, in your Awk
logic, you don't need to use sort
/uniq
piped to the output of Awk
, but process in Awk
itself.
请参阅 GNU Awk
使用预定义的阵列扫描顺序使用gawk
,您可以使用PROCINFO["sorted_in"]
变量(特定于gawk
)来控制希望Awk
对最终输出进行排序的方式.
Referring to GNU Awk
Using Predefined Array Scanning Orders with gawk
, you can use the PROCINFO["sorted_in"]
variable(gawk
specific) to control how you want Awk
to sort your final output.
请参阅下面的部分
@ind_str_asc
按索引升序排序,与字符串相比;这是最基本的排序. (在内部,数组索引始终是字符串,因此对于a[2*5] = 1
,索引是10
,而不是数字10
.)
@ind_str_asc
Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so witha[2*5] = 1
the index is10
rather than numeric10
.)
因此您可以在END
子句中的要求中使用它,
So using this in your requirement in the END
clause just do,
END{PROCINFO["sorted_in"]="@ind_str_asc"; for (i in unique) print i,unique[i]}
使用完整的命令
awk '
BEGIN{FS=OFS="|"}{
unique[$1]+=$2;
next
}
END{
PROCINFO["sorted_in"]="@ind_str_asc";
for (i in unique)
print i,unique[i]
}' file
这篇关于bash命令按计数分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!