如何使用awk将重复的行集转置为列 [英] How to transpose a repeating set of rows to columns using awk

查看：131 发布时间：2020/9/15 7:57:09 awk

本文介绍了如何使用awk将重复的行集转置为列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

创建一个文本文件，该文件的格式为7列:

Ive a text file with data in 7 columns in this format:

18030   AAJ51   FTO rs9939609   C__30090620_10  A   T
18030   AAJ51   CAT rs1001179   C__11468118_10  C   C
18030   AAJ51   CCL2    rs1024611   C___2590362_10  G   G
18030   AAJ51   TAS2R38 rs10246939  C___9506826_10  C   C
20287   AAJ51   FTO rs9939609   C__30090620_10  A   T
20287   AAJ51   CAT rs1001179   C__11468118_10  C   C
20287   AAJ51   CCL2    rs1024611   C___2590362_10  A   G
20287   AAJ51   TAS2R38 rs10246939  C___9506826_10  T   T

第2列，第3列，第4列和第5列是恒定的并重复.

The 2nd, 3rd 4th and 5th columns are constant and repeat.

变量是第一，第六和第七列.

The variables are the 1st, 6th and 7th columns.

我想以这种方式转置数据:

I would like to transpose the data in this way:

        FTO       CAT       CCL2        TAS2R38
        rs9939609 rs1001179 rs1024611   rs10246939
18030   AT        CC        GG          AT
20287   AT        CC        AG          TT

虽然示例显示每个ID有4行(第一列中的5位数字是ID)，但实际文件中每个ID有128行，因此执行匹配或正则表达式将不切实际，并且更喜欢使用迭代方法而不是行数.

Whilst the example shows 4 rows per ID (the 5-digit number in first column is the ID) the actual file has 128 rows per ID so performing a match or regex would not be practical and prefer a method that iterates over a number of rows.

我在上转换了n个行，但不确定如何对此应用程序进行修改.

I saw this example on converting n number of rows but am unsure how to modify for this application.

更新:CRLF结尾可能会导致格式问题，可以使用dos2unix之类的工具来解决

推荐答案

GNU Awk 解决方案:

GNU Awk solution:

awk '{ 
         if (!keys[$3]++) { b[++c] = $3; row1 = row1 OFS $3; row2 = row2 OFS $4 }
         line = groups[$1][$3];
         groups[$1][$3] = (line == ""? $6$7: line OFS $6$7) 
     }
     END{ 
         print row1 ORS row2; 
         for (i in groups) {
             r = i; 
             for (j in b) r = r OFS groups[i][b[j]];
             print r 
         } 
     }' OFS='\t' file | column -txn

输出:

       FTO        CAT        CCL2       TAS2R38
       rs9939609  rs1001179  rs1024611  rs10246939
18030  AT         CC         GG         CC
20287  AT         CC         AG         TT

这篇关于如何使用awk将重复的行集转置为列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用awk将重复的行集转置为列 [英] How to transpose a repeating set of rows to columns using awk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用awk将重复的行集转置为列 [英] How to transpose a repeating set of rows to columns using awk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭