将函数应用于其返回值基于其他行的pandas Dataframe [英] apply a function to a pandas Dataframe whose returned value is based on other rows

查看:91
本文介绍了将函数应用于其返回值基于其他行的pandas Dataframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a Dataframe looking like this:

>>> import pandas
>>> df = pandas.DataFrame({'region' : ['east', 'west', 'south', 'west',
...  'east', 'west', 'east', 'west'],
...  'item' : ['one', 'one', 'two', 'three',
...         'two', 'two', 'one', 'three'],
...         'quantity' : [3,3,4,5,12,14,3,8], "price" : [50,50,12,35,10,10,12,12]})
>>> df
    item  price  quantity region
0    one     50         3   east
1    one     50         3   west
2    two     12         4  south
3  three     35         5   west
4    two     10        12   east
5    two     10        14   west
6    one     12         3   east
7  three     12         8   west

我想做的是修改数量"列中的值.每个新的数量值都是基于该行的项目和价格组合所存在的不同区域的数量来计算的.更确切地说,我想取每个数量并将其乘以我编写的函数返回的区域权重,该函数取一个区域以及组成该池的其他区域的列表:

and what I want to do is modify the values in the quantity column. Each new quantity value is caculated based on the number of different regions that exist for this row's combination of item, and price. More concretly I want to take each quantity and multiply it by the weight of it's region returned by a function I wrote that takes a region and the list of other region composing the pool:

region_weight(region, list_of_regions).对于这种虚构的情况,让我们说:

region_weight(region, list_of_regions). For this imaginary situation, let's say:

  • 东部地区的价值为1
  • 西部地区值得2
  • 有价值的东西值得3

那么,在池东,池西中,东部的返回重量为0.3333333333333333(1/3).东,西,南泳池中的南方重量为0.5(1/2).

Then the returned weight of east in the pool east, west is 0.3333333333333333 (1/3). The weight of south in pool east, west, south is 0.5 (1/2).

因此,对于第一行,我们看一下项目1和价格50的其他行.东边有2行,西边有1行.第一行中的新数量为:3 * region_weight("east", ["east", "west"])或3 * 0.3333333333333333.

So for the first row, we look at what other rows there are of item one and price 50. There are 2 one with east and one with the west region. The new quantity in the first row would be: 3 * region_weight("east", ["east", "west"]) or 3 * 0.3333333333333333.

我想对整个数量列应用相同的过程.除了逐行遍历Dataframe之外,我不知道如何使用pandas库解决此问题.

I want to apply the same process to the whole quantity column. I don't know how to approach this problem with the pandas library other than looping through the Dataframe row by row.

推荐答案

好的,我认为这可以满足您的要求:

Ok, I think this does what you want:

制作一个有关地区权重的字典:

Make a dictionary of your regional weights:

In [1]: weights = {'east':1,'west':2,'south':3}

以下函数将系列"中的值映射到权重字典中找到的值. x是region的行值,w是映射到权重dict后的区域序列.

The following function maps values from a Series to the value found in the weights dictionary. x is the row value of region and w is the region series after it has been mapped to the weights dict.

In [2]: def f(x):
   ...:     w = x.map(weights)
   ...:     return w / w.sum().astype(float)

在这里,我们对['item','price']进行分组并应用上面的函数.输出是项目和价格的唯一组合的一系列相对权重.

Here, we groupby ['item','price'] and apply the function above. The output is a series of relative weights for the unique combinations of item and price.

In [3]: df.groupby(['item','price']).region.apply(f)
Out[3]:
0    0.333333
1    0.666667
2    1.000000
3    1.000000
4    0.333333
5    0.666667
6    1.000000
7    1.000000

最后,您可以将df.quantity与上述序列相乘,以计算出体重调整量.

Finally, you can multiply df.quantity by the above series to calculate your weight-adjusted quantities.

In [4]: df['wt_quant'] = df.groupby(['item','price']).region.apply(f) * df.quantity

In [5]: df
Out[5]:
    item  price  quantity region  wt_quant
0    one     50         3   east  1.000000
1    one     50         3   west  2.000000
2    two     12         4  south  4.000000
3  three     35         5   west  5.000000
4    two     10        12   east  4.000000
5    two     10        14   west  9.333333
6    one     12         3   east  3.000000
7  three     12         8   west  8.000000

这篇关于将函数应用于其返回值基于其他行的pandas Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆