确定占总数 90% 的记录 [英] identify records that make up 90% of total

查看:42
本文介绍了确定占总数 90% 的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一份报告,其中确定了总体数字/趋势的关键驱动因素.我想自动化该功能,以便能够根据该数字的百分比列出/识别基础记录.例如,如果南部(地区)小部件销售的净变化为 -5,000.00,但有正面和负面影响 - 我想确定至少约 90% (-4,500.00) 的所有潜在驱动因素 -从最大到最小总共 5,000.00.

I have a report that identifies key drivers of an overall number/trend. I would like to automate the functionality to be able to list/identify the underlying records based on a percentage of that number. For example if the net change for sales of widgets in the south(region) is -5,000.00, but there are positives and negatives- I would like to identify at least ~90% (-4,500.00) of all underlying drivers that make up that -5,000.00 total from largest to smallest.

data

region    OfficeLocation  sales
South     1                -500
South     2                300
South     3                -1000
South     4                -2000
South     5                 300
South     6                -700
South     7                -400
South     8                 800
North     11                300
North     22               -400
North     33                1000
North     44                800
North     55                900
North     66                -800

对于 South,总销售额为 -3200.我想确定/列出构成此移动至少 90% 的驱动因素(按降序排列)-因此 -3200 的 90% 将是 2880.以及 South office 3 & 的定向移动/销售;4 = -3000 将是此请求的输出:

for South, the total sales is -3200. I would like to identify/list the drivers that make up at least 90% of this move(in descending order)- so 90% of -3200 would be 2880. And the directional moves/sales for South office 3 & 4 = -3000 would be the output for this request:

region    OfficeLocation  sales
South     3                -1000
South     4                -2000

对于 North,总销售额为 +1800.我想确定/列出构成此移动至少 90% 的驱动因素(按降序排列)-因此 1800 的至少 90% 将是 1620.以及 South office 3 & 的定向移动/销售;4 = -3000 将是此请求的输出:

for North, the total sales is +1800. I would like to identify/list the drivers that make up at least 90% of this move(in descending order)- so at least 90% of 1800 would be 1620. And the directional moves/sales for South office 3 & 4 = -3000 would be the output for this request:

区域办事处位置销售北 33 1000北 44 800

region OfficeLocation sales North 33 1000 North 44 800

上面的数据集有南/北的积极和消极趋势.您能提供的任何帮助将不胜感激!

Dataset above has both positive and negative trends for south/north. Any help you can provide would be greatly appreciated!

推荐答案

正如评论中提到的,不清楚在 'North' 情况下该怎么做,因为那里的总和为正,但忽略这一点,您可以执行以下操作:

As mentioned in the comment, it isn't clear what to do in the 'North' case as the sum is positive there, but ignoring that, you could do something like the following:

In [200]: df[df.groupby('region').sales.apply(lambda g: g <= g.loc[(g.sort_values().cumsum() > 0.9*g.sum()).idxmin()])]
Out[200]:
   region  OfficeLocation  sales
2   South               3  -1000
3   South               4  -2000
13  North              66   -800

如果在积极的情况下,您想找到尽可能少的元素,它们共同具有占销售额总和的 90% 的特性,则可以采用上述解决方案:

If, in the positive case, you want to find as few elements as possible that together have the property that they make up 90% of the sum of the sales, the above solution can be adopted as follows:

def is_driver(group):
    s = group.sum()
    if s > 0:
        group *= -1
        s *= -1
    a = group.sort_values().cumsum() > 0.9*s
    return group <= group.loc[a.idxmin()]

In [168]: df[df.groupby('region').sales.apply(is_driver)]
Out[168]:
   region  OfficeLocation  sales
2   South               3  -1000
3   South               4  -2000
10  North              33   1000
12  North              55    900

注意,在平局的情况下,只挑出一个元素.

Note that in the case of a tie, only one element is picked out.

这篇关于确定占总数 90% 的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆