将多个文件,不同的计数值 [英] Combine multiple files with different count values

查看:91
本文介绍了将多个文件,不同的计数值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从每个文件采取第二列96合并文件,并保持第一列是所有文件之间相似。我试图做到这一点的R,但figued它会在终端上更好。它的工作原理用awk?

示例数据:

  DMED7013:RFAM robinm $头Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput402R.sam
SEQ _ .. /修剪/ 402R.tally.fasta __not_aligned
__too_low_aQual 3
MIR-10 5
Y_RNA 4
__too_low_aQual 0
__too_low_aQual 0
__not_aligned 1
MIR-8 2
MIR-671 3
MIR-671 16

中的文件:

  DMED7013:RFAM robinm $ ls -l命令
-rw-R - R-- 1 robinm人员1711388 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam
-rw-R - R-- 1 robinm人员1712778 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100R.sam
-rw-R - R-- 1 robinm人员1709703 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106G.sam
-rw-R - R-- 1 robinm人员1707486 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106R.sam
-rw-R - R-- 1 robinm人员1704757 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122G.sam
-rw-R - R-- 1 robinm人员1705471 9月22日19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122R.sam
.....


解决方案

您可以尝试(如果我理解正确的)

 的awk'(在D $ 1){D [$ 1 = $ 2!;下一个}
     {D [$ 1] + = $ 2}
     END {为(以D键)打印键,D [关键] }'* .SAM

您可以:


__too_low_aQual 3
MIR-671 19
MIR-8 2
__not_aligned 1
Y_RNA 4
MIR-10 5

I would like to combine 96 files by taking the second column from each files and keep the first column which is similar between all files. I tried to do this in R, but figued it would be better in the terminal. Does it work using awk?

Sample data:

DMED7013:Rfam robinm$ head Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput402R.sam
Seq_../trimmed/402R.tally.fasta __not_aligned
__too_low_aQual 3
mir-10 5
Y_RNA 4
__too_low_aQual 0
__too_low_aQual 0
__not_aligned 1
mir-8 2
mir-671 3
mir-671 16

The files:

DMED7013:Rfam robinm$ ls -l  
-rw-r--r--   1 robinm  staff  1711388 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam
-rw-r--r--   1 robinm  staff  1712778 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100R.sam
-rw-r--r--   1 robinm  staff  1709703 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106G.sam
-rw-r--r--   1 robinm  staff  1707486 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106R.sam
-rw-r--r--   1 robinm  staff  1704757 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122G.sam
-rw-r--r--   1 robinm  staff  1705471 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122R.sam
.....

解决方案

you can try (if, I understood correctly)

awk '!($1 in d){d[$1]=$2; next}
     {d[$1]+=$2}
     END{for(key in d) print key, d[key]; }' *.sam

you get:

__too_low_aQual 3
mir-671 19
mir-8 2
__not_aligned 1
Y_RNA 4
mir-10 5

这篇关于将多个文件,不同的计数值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆