python pandas,一个函数将根据另一行的条件应用于一行中的元素组合 [英] python pandas, a function will be applied to the combinations of the elements in one row based on a condition on the other row

查看:188
本文介绍了python pandas,一个函数将根据另一行的条件应用于一行中的元素组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎有类似的问题,但我找不到合适的答案。假设这是我的数据框,对于不同品牌的汽车有不同的观察结果:

It seems like there are similar questions, but I couldn't find a proper answer. Let's say this is my dataframe which has different observations for a different brand of cars:

df = pandas.DataFrame({'Car' : ['BMW_1', 'BMW_2', 'BMW_3', 'WW_1','WW_2','Fiat_1', 'Fiat_2'],
                       'distance'   : [10,25,22,24,37,33,49]})

为简单起见,我们假设函数的第一个元素乘以二三分之二:

For simplicity, let's assume that I have a function multiples first element by two and second by three:

def my_func(x,y):
   z = 2x + 3y
   return z

我想获得汽车所覆盖距离的成对组合,并在my_func中使用它们。但是有两个条件,即x和y不能为同一品牌,并且不应重复组合。所需的输出是这样的:

I want to get pairwise combinations of the distances covered by the cars and use them in my_func. But there are two conditions are that x and y can not be same brands and combinations should not be duplicated. Desired output is something like this:

  Car      Distance   Combinations                                
0  BMW_1   10         (BMW_1,WW_1),(BMW_1,WW_2),(BMW_1,Fiat_1),(BMW_1,Fiat_1)
1  BMW_2   25         (BMW_2,WW_1),(BMW_2,WW_2),(BMW_2,Fiat_1),(BMW_2,Fiat_1)
2  BMW_3   22         (BMW_3,WW_1),(BMW_3,WW_2),(BMW_3,Fiat_1),(BMW_3,Fiat_1)
3  WW_1    24         (WW_1, Fiat_1),(WW_1, Fiat_2)
4  WW_2    37         (WW_2, Fiat_1),(WW_2, Fiat_2)
5  Fiat_1  33         None
6  Fiat_2  49         None

//Output
[120, 134, 156, 178]
[113, 145, 134, 132]
[114, 123, 145, 182]
[153, 123] 
[120, 134] 
None 
None 

注意:我计算了要输出的数字。

Note: I made up the numbers for output.

下一步我想从每个品牌的输出行数组中获取最大数量。最终数据应如下所示:

Next Step I want to get maximum numbers from the arrays of 'output' row for each brand. And the final data should look like

  Car  Max_Distance
0 BMW  178
1 WW   153
2 Fiat None

如果有人可以帮助我,我将不胜感激

I will be grateful if someone could help me

推荐答案

更新:

In [49]: x = pd.DataFrame(np.triu(squareform(pdist(df[['distance']], my_func))),
    ...:                  columns=df.Car.str.split('_').str[0],
    ...:                  index=df.Car.str.split('_').str[0]).replace(0, np.nan)
    ...:

In [50]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[50]:
Car
BMW     197.0
Fiat      NaN
WW      221.0
dtype: float64

旧答案:

IIUC您可以执行类似的操作以下:

IIUC you can do something like the following:

from scipy.spatial.distance import pdist, squareform

def my_func(x,y):
    return 2*x + 3*y

x = pd.DataFrame(
    squareform(pdist(df[['distance']], my_func)),
    columns=df.Car.str.split('_').str[0],
    index=df.Car.str.split('_').str[0])

它产生了:

In [269]: x
Out[269]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW     0.0   95.0   86.0   92.0  131.0  119.0  167.0
BMW    95.0    0.0  116.0  122.0  161.0  149.0  197.0
BMW    86.0  116.0    0.0  116.0  155.0  143.0  191.0
WW     92.0  122.0  116.0    0.0  159.0  147.0  195.0
WW    131.0  161.0  155.0  159.0    0.0  173.0  221.0
Fiat  119.0  149.0  143.0  147.0  173.0    0.0  213.0
Fiat  167.0  197.0  191.0  195.0  221.0  213.0    0.0

使用同一品牌:

In [270]: x.apply(lambda col: col.index != col.name)
Out[270]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW   False  False  False   True   True   True   True
BMW   False  False  False   True   True   True   True
BMW   False  False  False   True   True   True   True
WW     True   True   True  False  False   True   True
WW     True   True   True  False  False   True   True
Fiat   True   True   True   True   True  False  False
Fiat   True   True   True   True   True  False  False

In [273]: x[x.apply(lambda col: col.index != col.name)]
Out[273]:
Car     BMW    BMW    BMW     WW     WW   Fiat   Fiat
Car
BMW     NaN    NaN    NaN   92.0  131.0  119.0  167.0
BMW     NaN    NaN    NaN  122.0  161.0  149.0  197.0
BMW     NaN    NaN    NaN  116.0  155.0  143.0  191.0
WW     92.0  122.0  116.0    NaN    NaN  147.0  195.0
WW    131.0  161.0  155.0    NaN    NaN  173.0  221.0
Fiat  119.0  149.0  143.0  147.0  173.0    NaN    NaN
Fiat  167.0  197.0  191.0  195.0  221.0    NaN    NaN

选择每行ing最大值:

selecting maximum per row:

In [271]: x[x.apply(lambda col: col.index != col.name)].max(1)
Out[271]:
Car
BMW     167.0
BMW     197.0
BMW     191.0
WW      195.0
WW      221.0
Fiat    173.0
Fiat    221.0
dtype: float64

每个品牌的最大值:

In [276]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[276]:
Car
BMW     197.0
Fiat    221.0
WW      221.0
dtype: float64

这篇关于python pandas,一个函数将根据另一行的条件应用于一行中的元素组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆