用awk对重复的行值求和 [英] Sum duplicate row values with awk
问题描述
我有一个具有以下结构的文件:
I have a file with the following structure:
1486113768 3656
1486113768 6280
1486113769 530912
1486113769 5629824
1486113770 5122176
1486113772 3565920
1486113772 530912
1486113773 9229920
1486113774 4020960
1486113774 4547928
我的目标是消除第一列中的重复值,求和第二列中的值,并用新的列值更新行:上面输入中的有效输出为:
My goal is to get rid of duplicate values in the first columns, sum the values in the second columns and update the row with new columns value: a working output, from the input above, would be:
1486113768 9936 # 3656 + 6280
1486113769 6160736 # 530912 + 5629824
1486113770 5122176 # ...
1486113772 4096832
1486113773 9229920
1486113774 8568888
我知道cut
,uniq
:到目前为止,我设法通过以下方式在第一列中找到重复的值:
I know cut
, uniq
: until now I managed to find the duplicate values in first columns with:
cut -d " " -f 1 file.log | uniq -d
1486113768
1486113769
1486113772
1486113774
是否有一种笨拙的方式"实现我的目标?我知道这是一个非常强大且简洁的工具:我早些时候在
Is there a "awk way" to achieve my goal? I know it is very powerful and terse tool: I used it earlier with
awk '{print $2 " " $3 >> $1".log"}' log.txt
扫描log.txt中的所有行,并创建一个以$ 1为名称的.log文件,并用$ 2和$ 3值填充它,所有这些都在一条bash行中(通过read
循环到地狱!);有没有办法找到第一列重复项,求和第二列值,然后重写删除重复项并打印第二列结果的行?
to scan all rows in log.txt and create a .log file with $1 as name, and filling it with $2 and $3 values, all in one bash line (to hell with read
loop!); is there a way to find first column duplicates, sum its second column values and rewrite the row removing the duplicates and printing the resulting sum of second column?
推荐答案
使用如下所示的Awk
,
awk '{ seen[$1] += $2 } END { for (i in seen) print i, seen[i] }' file1
1486113768 9936
1486113769 6160736
1486113770 5122176
1486113772 4096832
1486113773 9229920
1486113774 8568888
{seen[$1]+=$2}
创建一个哈希图,将$1
视为索引值,并且总和仅针对文件中$1
中的那些唯一项递增.
{seen[$1]+=$2}
creates a hash-map with the $1
being treated as the index value and the sum is incremented only for those unique items from $1
in the file.
这篇关于用awk对重复的行值求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!