唯一化列并获取linux中的频率 [英] Unique the columns and get the frequencies in linux
问题描述
我有一个具有矩阵结构(4 X 9)的 data.txt
:
I have a data.txt
with a matrix structure (4 X 9):
101000110
000000010
001010010
100101101
我要计算唯一列的频率,预期结果是:
I want to count the frequencies of unique columns, the expected result is:
1001 2
0000 1
1010 1
0001 3
0010 1
1110 1
我只能在Internet上使用 awk
找到根据特定列的唯一行",我是否需要首先转置数据来解决此问题.我想知道是否有更直接的方法来解决?谢谢.
I only find "unique lines according to specific columns" using awk
on the Internet, do I need to first transpose my data to solve this problem. I wonder whether there is a more direct way to figure it out? Thank you.
推荐答案
此 awk
将帮助:
awk '{for (i=1;i<=NF;i++){
a[i]=a[i]""$i
}
}
END{
for (i=1;i<=9;i++) {
res[a[i]]++
}
for (r in res){
print r, res[r]
}
}' FS= yourfile
结果
1110 1
0000 1
0010 1
0001 3
1010 1
1001 2
说明
for (i=1;i<=NF;i++){
a[i]=a[i]""$i
}
}
将信息存储在一个九列的数组中作为键,因为我们知道它是一个常规矩阵,所以我们会将每个值附加到其位置
Stores the info in a nine column array as a key, as we know that it’s a regular matrix we will append each value to its position
for (i=1;i<=9;i++) {
res[a[i]]++
}
将数字存储到关联数组中并计算出现次数
Store the number into an associative array and count the occurrences
for (r in res){
print r, res[r]
}
只显示最终结果.
这篇关于唯一化列并获取linux中的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!