python pandas,一个函数将根据另一行的条件应用于一行中的元素组合 [英] python pandas, a function will be applied to the combinations of the elements in one row based on a condition on the other row
问题描述
似乎有类似的问题,但我找不到合适的答案。假设这是我的数据框,对于不同品牌的汽车有不同的观察结果:
It seems like there are similar questions, but I couldn't find a proper answer. Let's say this is my dataframe which has different observations for a different brand of cars:
df = pandas.DataFrame({'Car' : ['BMW_1', 'BMW_2', 'BMW_3', 'WW_1','WW_2','Fiat_1', 'Fiat_2'],
'distance' : [10,25,22,24,37,33,49]})
为简单起见,我们假设函数的第一个元素乘以二三分之二:
For simplicity, let's assume that I have a function multiples first element by two and second by three:
def my_func(x,y):
z = 2x + 3y
return z
我想获得汽车所覆盖距离的成对组合,并在my_func中使用它们。但是有两个条件,即x和y不能为同一品牌,并且不应重复组合。所需的输出是这样的:
I want to get pairwise combinations of the distances covered by the cars and use them in my_func. But there are two conditions are that x and y can not be same brands and combinations should not be duplicated. Desired output is something like this:
Car Distance Combinations
0 BMW_1 10 (BMW_1,WW_1),(BMW_1,WW_2),(BMW_1,Fiat_1),(BMW_1,Fiat_1)
1 BMW_2 25 (BMW_2,WW_1),(BMW_2,WW_2),(BMW_2,Fiat_1),(BMW_2,Fiat_1)
2 BMW_3 22 (BMW_3,WW_1),(BMW_3,WW_2),(BMW_3,Fiat_1),(BMW_3,Fiat_1)
3 WW_1 24 (WW_1, Fiat_1),(WW_1, Fiat_2)
4 WW_2 37 (WW_2, Fiat_1),(WW_2, Fiat_2)
5 Fiat_1 33 None
6 Fiat_2 49 None
//Output
[120, 134, 156, 178]
[113, 145, 134, 132]
[114, 123, 145, 182]
[153, 123]
[120, 134]
None
None
注意:我计算了要输出的数字。
Note: I made up the numbers for output.
下一步我想从每个品牌的输出行数组中获取最大数量。最终数据应如下所示:
Next Step I want to get maximum numbers from the arrays of 'output' row for each brand. And the final data should look like
Car Max_Distance
0 BMW 178
1 WW 153
2 Fiat None
如果有人可以帮助我,我将不胜感激
I will be grateful if someone could help me
推荐答案
更新:
In [49]: x = pd.DataFrame(np.triu(squareform(pdist(df[['distance']], my_func))),
...: columns=df.Car.str.split('_').str[0],
...: index=df.Car.str.split('_').str[0]).replace(0, np.nan)
...:
In [50]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[50]:
Car
BMW 197.0
Fiat NaN
WW 221.0
dtype: float64
旧答案:
IIUC您可以执行类似的操作以下:
IIUC you can do something like the following:
from scipy.spatial.distance import pdist, squareform
def my_func(x,y):
return 2*x + 3*y
x = pd.DataFrame(
squareform(pdist(df[['distance']], my_func)),
columns=df.Car.str.split('_').str[0],
index=df.Car.str.split('_').str[0])
它产生了:
In [269]: x
Out[269]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW 0.0 95.0 86.0 92.0 131.0 119.0 167.0
BMW 95.0 0.0 116.0 122.0 161.0 149.0 197.0
BMW 86.0 116.0 0.0 116.0 155.0 143.0 191.0
WW 92.0 122.0 116.0 0.0 159.0 147.0 195.0
WW 131.0 161.0 155.0 159.0 0.0 173.0 221.0
Fiat 119.0 149.0 143.0 147.0 173.0 0.0 213.0
Fiat 167.0 197.0 191.0 195.0 221.0 213.0 0.0
使用同一品牌:
In [270]: x.apply(lambda col: col.index != col.name)
Out[270]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW False False False True True True True
BMW False False False True True True True
BMW False False False True True True True
WW True True True False False True True
WW True True True False False True True
Fiat True True True True True False False
Fiat True True True True True False False
In [273]: x[x.apply(lambda col: col.index != col.name)]
Out[273]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW NaN NaN NaN 92.0 131.0 119.0 167.0
BMW NaN NaN NaN 122.0 161.0 149.0 197.0
BMW NaN NaN NaN 116.0 155.0 143.0 191.0
WW 92.0 122.0 116.0 NaN NaN 147.0 195.0
WW 131.0 161.0 155.0 NaN NaN 173.0 221.0
Fiat 119.0 149.0 143.0 147.0 173.0 NaN NaN
Fiat 167.0 197.0 191.0 195.0 221.0 NaN NaN
选择每行ing最大值:
selecting maximum per row:
In [271]: x[x.apply(lambda col: col.index != col.name)].max(1)
Out[271]:
Car
BMW 167.0
BMW 197.0
BMW 191.0
WW 195.0
WW 221.0
Fiat 173.0
Fiat 221.0
dtype: float64
每个品牌的最大值:
In [276]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[276]:
Car
BMW 197.0
Fiat 221.0
WW 221.0
dtype: float64
这篇关于python pandas,一个函数将根据另一行的条件应用于一行中的元素组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!