如何按百分比添加列 [英] How to Add Column with Percentage
问题描述
我想计算所有行中每一行的价值百分比,并将其添加为另一列. 输入(定界符为\ t):
I would like to calculate percentage of value in each line out of all lines and add it as another column. Input (delimiter is \t):
1 10
2 10
3 20
4 40
所需的输出,其中增加了第三列,显示了根据第二列中的值计算出的百分比:
Desired output with added third column showing calculated percentage based on values in second column:
1 10 12.50
2 10 12.50
3 20 25.00
4 40 50.00
我尝试自己做,但是当我计算所有行的总数时,我不知道如何保持其余行不变.非常感谢您的帮助!
I have tried to do it myself, but when I calculated total for all lines I didn't know how to preserve rest of line unchanged. Thanks a lot for help!
推荐答案
在这里,一个 pass 步骤awk解决方案-
Here you go, one pass step awk solution -
awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
更新:如果输出中需要使用制表符,则只需将OFS变量设置为"\ t"即可.
Update: If tab is a required in output then just set the OFS variable to "\t".
[jaypal:~/Temp] awk -v OFS="\t" 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
模式{action}语句的突破:
-
第一个模式是
NR==FNR
. FNR是awk的内置变量,用于跟踪给定文件中的记录数(默认情况下用新行分隔).因此,在我们的情况下,FNR为4.NR与FNR相似,但不会重置为0.它会继续增长.因此本例中的NR为8.
The first pattern is
NR==FNR
. FNR is awk's in-built variable that keeps track of number of records (by default separated by a new line) in a given file. So FNR in our case would be 4. NR is similar to FNR but it does not get reset to 0. It continues to grow on. So NR in our case would be 8.
该模式仅对前4条记录适用,这正是我们想要的.在仔细阅读了4条记录之后,我们将总数分配给变量a
.注意,我们没有初始化它.在awk
中,我们不必这样做.但是,如果整个第2列为0,这会中断.因此,您可以通过在第二个动作语句中放置一个if语句来处理它,即仅当a> 0时才进行除法,否则用0除以某物.
This pattern will be true only for the first 4 records and thats exactly what we want. After perusing through the 4 records, we are assign the total to a variable a
. Notice that we did not initialize it. In awk
we don't have to. However, this would break if entire column 2 is 0. So you can handle it by putting an if statement in the second action statement i.e do the division only if a > 0 else say division by 0 or something.
next
是因为我们实际上并不希望执行第二个模式{action}语句. next
告诉awk停止进一步的操作并移至下一条记录.
next
is needed cause we don't really want second pattern {action} statement to execute. next
tells awk to stop further actions and move to the next record.
一旦解析了四个记录,下一个模式{action}就开始了,这很简单.进行百分比计算,并在列1和2以及其旁边打印百分比.
Once the four records are parsed, the next pattern{action} begins, which is pretty straight forward. Doing the percentage and print column 1 and 2 along with percentage next to them.
注意: 正如注释中提到的@lhf一样,只有在文件中设置了数据的情况下,此单线才会起作用.如果您通过管道传递数据,它将无法正常工作.
评论中,正在讨论如何使此awk one-liner
接受来自pipe
而不是file
的输入.好吧,我能想到的唯一方法是将列值存储在array
中,然后使用for loop
吐出每个值及其百分比.
In the comments, there is a discussion going on ways to make this awk one-liner
take input from a pipe
instead of a file
. Well the only way I could think of was to store the column values in array
and then using for loop
to spit each value out along with their percentage.
现在awk
中的arrays
是associative
且从不顺序排列,即,将值从数组中拉出的顺序将与输入顺序不同.因此,如果可以,则下面的一个-班轮应该工作.
Now arrays
in awk
are associative
and are never in order, i.e pulling the values out of arrays will not be in the same order as they went in. So if that is ok then the following one-liner should work.
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}'
2 10 12.5
3 20 25
4 40 50
1 10 12.5
要按顺序获取它们,可以将结果传递到sort
.
To get them in order, you can pipe the result to sort
.
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}' | sort -n
1 10 12.5
2 10 12.5
3 20 25
4 40 50
这篇关于如何按百分比添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!