awk运行总计数和总和 [英] awk running total count and sum

查看:81
本文介绍了awk运行总计数和总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想知道如何每天计算供应商的80%-20%规则贡献.

Would like to know how to calculate 80%-20% rule contribution of vendors on daily basis.

Input.csv

Input.csv

Date,Region,Vendor,Amount
5-Apr-15,east,cc,50
5-Apr-15,east,dd,15
5-Apr-15,south,bb,15
5-Apr-15,south,aa,10
7-Apr-15,east,cc,123
7-Apr-15,south,bb,88
7-Apr-15,south,aa,40
7-Apr-15,west,ss,30
7-Apr-15,west,rr,20

在上面的输入中,需要基于$ 1字段填充金额的运行总和",然后计算当天金额的运行总和"的百分比

In the above input, based on $1 field need to populate Running Sum of Amount then calculate percentage of Running Sum of Amount for the day

Date,Region,Vendor,Amount,RunningSum,%RunningSum
5-Apr-15,east,cc,50,50,56%   (RunningSum=50 , %RunningSum=50/90(Total Amount for the day) 
5-Apr-15,east,dd,15,65,72%   (RunningSum=50+15, %RunningSum=65/90)
5-Apr-15,south,bb,15,80,89%  (RunningSum=65+15, %RunningSum=80/90)
5-Apr-15,south,aa,10,90,100% (RunningSum=80+10, %RunningSum=90/90)

一旦获得80%或高于80%的首次匹配,则需要将其视为80%贡献,其余订单项则应视为20%贡献.

Once it is derived 80% or first hit of 80%above need to consider as 80% contribution remaining line items need to be consider as 20% contribution.

预期输出:

Date,Countof80%Vendor, SumOf80%Vendor, Countof20%Vendor, SumOf20%Vendor
5-Apr-15,3,80,1,10
7-Apr-15,3,251,2,50

任何建议...

推荐答案

awk中执行此操作相对较复杂,但是可以.

This is relatively complicated to do in awk, but here goes.

awk -F , '{vals[$1,++nums[$1]]=$4}END{
    for(d in nums){
        tot=0
        for(i=1;i<=nums[d];i++)tot+=vals[d,i]
        n=0;s=0
        for(i=1;i<=nums[d];i++){
            n++;s+=vals[d,i]
            if(s>=tot*0.8){s80=s;n80=n;s=0;n=0}
        }
        printf("%s,%d,%d,%d,%d\n",d,n80,s80,n,s)
    }
}' vendors.txt

这将为每个日期创建独立的值列表,并在通过这些列表收集所有数据循环后找出80%的位置.

This creates independent lists of values for each date, and after collecting all the data loops trough those lists and figures out where the 80% point is.

这假定文件仅包含数据(不包含"Date,Region,Vender .."头).如果您希望使用单行版本进行复制和粘贴,则为:

This assumes that the file contains only data (no "Date,Region,Vender.." header). If you want a one-line version for copy and paste purposes, here it is:

awk -F , '{vals[$1,++nums[$1]]=$4}END{for(d in nums){tot=0;for(i=1;i<=nums[d];i++)tot+=vals[d,i];n=0;s=0;for(i=1;i<=nums[d];i++){n++;s+=vals[d,i];if(s>=tot*0.8){s80=s;n80=n;s=0;n=0}};printf("%s,%d,%d,%d,%d\n",d,n80,s80,n,s)}}' vendors.txt

如果要分别计算每个区域,则只需将$ 1部分更改为$ 1," $ 2,以连接日期和区域:

If you want counts separately for each region, you just need to change the $1 part into $1","$2, to concatenate date and region:

awk -F , '{a=$1","$2;vals[a,++nums[a]]=$4}END{for(d in nums){tot=0;for(i=1;i<=nums[d];i++)tot+=vals[d,i];n=0;s=0;for(i=1;i<=nums[d];i++){n++;s+=vals[d,i];if(s>=tot*0.8){s80=s;n80=n;s=0;n=0}};printf("%s,%d,%d,%d,%d\n",d,n80,s80,n,s)}}' vendors.txt

不过,老实说,我不清楚您到底想以此来衡量什么.例如,构成销售额的80%的供应商数量不是唯一定义的-取决于您对它们进行排序的顺序.如果首先对所有小额销售进行排序,则比对小批量销售进行排序会得到更大的数量.大量销售第一.如果不按任何特定顺序对它们进行排序,您将获得一些中间的东西.如果您想要的是最大的供应商中有多少占80%,那么您需要在对阵列进行排序之前进行计数.

To be honest, though, I'm a bit unclear about exactly what you're trying to measure with this. For example, the number of vendors that make up 80% of the sales is not uniquely defined - it depends on the order you sort them in. If you sort all the small sales first, you will get a larger number than if you sort the large sales first. If you don't sort them in any particular order, you will get something intermediate. If what you want is how many of the biggest vendors make up 80% of the sales, then you will need to sort the array before counting.

这篇关于awk运行总计数和总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆