如何使用awk将重复的行集转置为列 [英] How to transpose a repeating set of rows to columns using awk
问题描述
创建一个文本文件,该文件的格式为7列:
Ive a text file with data in 7 columns in this format:
18030 AAJ51 FTO rs9939609 C__30090620_10 A T
18030 AAJ51 CAT rs1001179 C__11468118_10 C C
18030 AAJ51 CCL2 rs1024611 C___2590362_10 G G
18030 AAJ51 TAS2R38 rs10246939 C___9506826_10 C C
20287 AAJ51 FTO rs9939609 C__30090620_10 A T
20287 AAJ51 CAT rs1001179 C__11468118_10 C C
20287 AAJ51 CCL2 rs1024611 C___2590362_10 A G
20287 AAJ51 TAS2R38 rs10246939 C___9506826_10 T T
第2列,第3列,第4列和第5列是恒定的并重复.
The 2nd, 3rd 4th and 5th columns are constant and repeat.
变量是第一,第六和第七列.
The variables are the 1st, 6th and 7th columns.
我想以这种方式转置数据:
I would like to transpose the data in this way:
FTO CAT CCL2 TAS2R38
rs9939609 rs1001179 rs1024611 rs10246939
18030 AT CC GG AT
20287 AT CC AG TT
虽然示例显示每个ID有4行(第一列中的5位数字是ID),但实际文件中每个ID有128行,因此执行匹配或正则表达式将不切实际,并且更喜欢使用迭代方法而不是行数.
Whilst the example shows 4 rows per ID (the 5-digit number in first column is the ID) the actual file has 128 rows per ID so performing a match or regex would not be practical and prefer a method that iterates over a number of rows.
我在上转换了n个行,但不确定如何对此应用程序进行修改.
I saw this example on converting n number of rows but am unsure how to modify for this application.
更新:CRLF结尾可能会导致格式问题,可以使用dos2unix之类的工具来解决
推荐答案
GNU Awk
解决方案:
GNU Awk
solution:
awk '{
if (!keys[$3]++) { b[++c] = $3; row1 = row1 OFS $3; row2 = row2 OFS $4 }
line = groups[$1][$3];
groups[$1][$3] = (line == ""? $6$7: line OFS $6$7)
}
END{
print row1 ORS row2;
for (i in groups) {
r = i;
for (j in b) r = r OFS groups[i][b[j]];
print r
}
}' OFS='\t' file | column -txn
输出:
FTO CAT CCL2 TAS2R38
rs9939609 rs1001179 rs1024611 rs10246939
18030 AT CC GG CC
20287 AT CC AG TT
这篇关于如何使用awk将重复的行集转置为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!