用awk总结或平均每个唯一ID [英] Use awk to sum or average for each unique ID

查看:200
本文介绍了用awk总结或平均每个唯一ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我如何用awk为了计算两个人列的总和或平均一列的每一个唯一的ID。

Can anyone tell me how to use awk in order to calculate the sum of two individuals columns or the average of one column for each unique ID.

输入

chr1    3661532 3661533 0.0 5   0   chr1    3661529 3662079 NM_01011874     
chr1    3661534 3661535 0.2 5   1   chr1    3661529 3662079 NM_01011874     
chr1    3661537 3661538 0.0 5   0   chr1    3661529 3662079 NM_01011874
chr1    3661559 3661560 0.0 6   0   chr1    3661529 3662079 NM_01011874
chr2    4661532 4661533 0.0 8   0   chr1    4661532 4661533 NM_00175642     
chr2    6661534 6661535 0.2 5   2   chr1    6661534 6661535 NM_00175642     
chr2    2661537 2661538 0.0 5   0   chr1    2661537 2661538 NM_00175642
chr2    9661559 9661560 0.0 7   0   chr1    9661559 9661560 NM_00175642

输出(SUM $ 5 $ 6)对于每个唯一ID

Output (sum $5 $6) for each unique ID

NM_01011874 21 1 
NM_00175642 25 2

或平均$ 4每一个独特的ID

or average of $4 for each unique ID

NM_01011874 0.0476
NM_00175642 0.08

另外,如果你能击穿解决方案的组件我将不胜感激。我是一位分子生物学家以最小的生物信息学的培训。

Also, if you could breakdown the components of the solution I would be grateful. I'm a molecular biologist with minimal bioinformatics training.

推荐答案

5列总和和6%ID:

sum of columns 5 and 6 per id:

awk '{sum5[$10] += $5; sum6[$10] += $6}; END{ for (id in sum5) { print id, sum5[id], sum6[id] } }' < /tmp/input 
NM_00175642 25 2
NM_01011874 21 1

解释:$ 10是id字段,$ 5和$ 6,用于相加列5和6列5和6,我们建立2个数组(这是由字符串索引,所以我们可以使用id字段)。在我们处理完所有的行/记录,我们通过数组键(ID字符串)迭代,并在该数组索引打印的值。

Explained: $10 is the id field, $5 and $6 are columns 5 and 6. We build 2 arrays for summing columns 5 and 6 (which are indexed by strings, so we can use the id field). Once we've processed all the lines/records, we iterate through the array keys (id strings), and print the value at that array index.

平均4列,每ID:

awk '{sum4[$10] += $4; count4[$10]++}; END{ for (id in sum4) { print id, sum4[id]/count4[id] } }' < /tmp/input 
NM_00175642 0.05
NM_01011874 0.05

解释:非常类似于求和例子。我们保持每个ID 4列的总和,并且看到每个ID的记录的计数。最后,我们通过IDS迭代并打印的总和/计数。

Explained: Very similar to the summing example. We keep a sum of column 4 per id, and a count of records seen for each id. At the end, we iterate through the ids and print the sum/count.

我没有做很多使用awk,我找到perl多的小脚本更好。但看起来像一个良好的起点。有链接到更多的页面与示例脚本。

I don't do much with awk, I find Perl much better for small scripts. But this looks like a good starting point. There are links to more pages with example scripts.

这篇关于用awk总结或平均每个唯一ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆