通过awk将频率(出现次数)添加到我的文本表中 [英] Add frequency (number of occurrences) to my table of text through awk

查看:110
本文介绍了通过awk将频率(出现次数)添加到我的文本表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出此输入表:

pac1 xxx 
pac1 yyy
pac1 zzz
pac2 xxx
pac2 uuu
pac3 zzz
pac3 uuu
pac4 zzz

我需要这样向第三列添加频率:

I need to add frequencies to third column like this:

pac1 xxx 2/3
pac1 yyy 1/3
pac1 zzz 3/3
pac2 xxx 2/2
pac2 uuu 2/2
pac3 zzz 2/2
pac3 uuu 2/2
pac4 zzz 3/1

第一个数字是第二列中的出现次数.

Where first number is number of occurrences in second column.

awk '{print $2}' input | sort | uniq -c

斜线后的数字是第一列的唯一数:

And number after slash is uniq occurrences of first column:

awk '{print $1}' input | sort | uniq -c

我想在awk中使用实现.

I would like to use implementation in awk.

请修改输出-第一列是名称,我需要计算在第一列中出现多少个uniq名称,例如:

Please modify the output - first column are names and I need to count how many uniq names is occur in first column like:

pac1 xxx 2/4
pac1 yyy 1/4
pac1 zzz 3/4
pac2 xxx 2/4
pac2 uuu 2/4
pac3 zzz 2/4
pac3 uuu 2/4
pac4 zzz 3/4

所以uniq名称只有pac1,pac2,pac3,pac4 => 4

So uniq names are only pac1,pac2,pac3,pac4 => 4

类似这样的东西:

occur=$(awk '{print $1}' input | sort | wc -l)

awk -v occur=$occur '{col2[$2]++} {print $0, col2[$2] "/" occur}' file

A要避免变量$ occur.

A would like to avoid variable $occur.

推荐答案

只需读取文件两次:首先计算值并将它们存储在数组中,然后打印其值:

Just read the file twice: first to count the values and store them in an array, then to print its values:

$ awk 'FNR==NR {col1[$1]++; col2[$2]++; next} {print $0, col2[$2] "/" col1[$1]}' file file
pac1 xxx 2/3
pac1 yyy 1/3
pac1 zzz 3/3
pac2 xxx 2/2
pac2 uuu 2/2
pac3 zzz 3/2
pac3 uuu 2/2
pac4 zzz 3/1

FNR==NR {things; next}是仅在读取第一个文件时就可以执行操作的技巧.它基于使用FNRNR:前者表示字段记录数",而后者表示记录数".这意味着FNR包含当前文件的行数,而NR包含到目前为止已整体读取的行数,因此FNR==NR仅在读取第一个文件时为true.通过添加next,我们跳过当前行并跳至下一行.

The FNR==NR {things; next} is a trick to do things just when reading the first file. It is based on using FNR and NR: the former means Field Number of Record and the latter Number of Record. This means that FNR contains the number of line of the current file, while NR contains the number of lines that have been read so far overall, making FNR==NR true just when reading the first file. By adding next we skip the current line and jump to the next one.

惯用awk 中查找更多信息.

关于更新:如果您希望最后一项包含第一列中不同值的计数,则只需检查创建的数组的长度即可.这将告诉您它包含许多不同的索引,并因此提供您想要的值:

Regarding your update: if you want the last item to contain the count of different values in the first column, just check the length of the array that was created. This will tell you many different indexes it contains, and hence the value you want:

$ awk 'FNR==NR {col1[$1]++; col2[$2]++; next} {print $0, col2[$2] "/" length(col1)}' file file
pac1 xxx 2/4
pac1 yyy 1/4
pac1 zzz 3/4
pac2 xxx 2/4
pac2 uuu 2/4
pac3 zzz 3/4
pac3 uuu 2/4
pac4 zzz 3/4

这篇关于通过awk将频率(出现次数)添加到我的文本表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆