Python Pandas Dataframe中行的条件减法 [英] Conditional Subtraction of rows in Python Pandas Dataframe
问题描述
我正在尝试解决所说明的问题。我有一个数据框,如下所示:
I am trying to solve a problem at hand as explained. I have a Dataframe as shown below:
Date Item Type Qty Price
1/1/18 Orange Add 100 25
5/1/18 Orange Add 20 40
8/1/18 Orange Add 40 20
18/1/18 Orange Add 10 35
27/2/18 Orange Sub 100 55
15/4/18 Orange Sub 30 45
,我想获得中间数据帧如下:
and I want to get the intermediate Dataframe like below:
Date Item Type Qty Price Diff
1/1/18 Orange Add 0 25 30
5/1/18 Orange Add 0 40 5
8/1/18 Orange Add 30 20 25
18/1/18 Orange Add 10 35
,然后是我想要的最终数据框,如下所示:
and then the final Dataframe I want it like this below:
Date Item Type Qty Price
8/1/18 Orange Add 30 20
18/1/18 Orange Add 10 35
注意:Diff是买价和买价之差。数量也会用添加数量减去子数量进行更新。
NOTE: Diff is a difference of Sub and Add Price. And Qty is also updated with Qty of Sub subtracted from Qty of Add.
请问你们中的任何人都可以帮助实现它。我正在尝试使用groupby,应用和转换,但是直到现在我还没有得到这个。
Could anyone of you please help with the way it can be achieved. I was trying with groupby, apply and transform but till now I have not got this.
我有下面的代码,仍在开发中,尚未完成:
I have below code, still in development and not complete:
def FruitSummary():
df = pd.DataFrame([
['01/1/18', 'Orange', 'Add', 100, 25],
['05/1/18', 'Orange', 'Add', 20, 40],
['08/1/18', 'Orange', 'Add', 40, 20],
['18/1/18', 'Orange', 'Add', 10, 35],
['27/2/18', 'Orange', 'Sub', 100, 55],
['15/4/18', 'Orange', 'Sub', 30, 45],
['02/1/18', 'Banana', 'Add', 110, 7],
['04/1/18', 'Banana', 'Add', 20, 9],
['11/1/18', 'Banana', 'Add', 40, 4],
['10/2/18', 'Banana', 'Add', 10, 3],
['15/3/18', 'Banana', 'Sub', 100, 9],
['15/4/18', 'Banana', 'Sub', 50, 8],
['10/3/18', 'Kiwi', 'Add', 80, 29],
['12/3/18', 'Berry', 'Add', 25, 5],
['18/4/18', 'Berry', 'Add', 15, 8]],
columns=['Date', 'Item', 'Type', 'Qty', 'Price'])
print(df)
def fruit_stat(dfIN):
print(dfIN)
print((dfIN['Type'] == 'Sub').unique(), (dfIN['Type'] == 'ODD').unique())
if len(dfIN) > 1 and (True in (dfIN['Type'] == 'Sub').unique()):
print(dfIN['Item'].iloc[1], "'len > 1'", "'Sub True'")
dfFS = df.groupby(['Item']).apply(fruit_stat)
print(dfFS)
推荐答案
我能够找到一些解决方案,不确定是最优的还是更好的解决方案
I am able to find some solution, not sure if it is optimal or there might be better solution for the same.
df = pd.DataFrame([['01/1/18', 'Orange', 'Add', 100, 25],
['05/1/18', 'Orange', 'Add', 20, 40],
['08/1/18', 'Orange', 'Add', 40, 20],
['18/1/18', 'Orange', 'Add', 10, 35],
['27/2/18', 'Orange', 'Sub', 100, 55],
['15/4/18', 'Orange', 'Sub', 30, 45],
['02/1/18', 'Banana', 'Add', 110, 7],
['04/1/18', 'Banana', 'Add', 20, 9],
['11/1/18', 'Banana', 'Add', 40, 4],
['10/2/18', 'Banana', 'Add', 10, 3],
['15/3/18', 'Banana', 'Sub', 100, 9],
['15/4/18', 'Banana', 'Sub', 50, 8],
['10/3/18', 'Kiwi', 'Add', 80, 29],
['12/3/18', 'Berry', 'Add', 25, 5],
['18/4/18', 'Berry', 'Add', 15, 8],
['16/3/18', 'Cherry', 'Add', 25, 5],
['21/4/18', 'Cherry', 'Sub', 25, 8],
['19/3/18', 'Grapes', 'Add', 25, 5],
['23/4/18', 'Grapes', 'Sub', 15, 8]],
columns=['Date', 'Item', 'Type', 'Qty', 'Price'])
def FruitSummary(df):
df['CumSum'] = df.groupby(['Item', 'Type'])['Qty'].cumsum()
print(df)
def fruit_stat(dfg):
if dfg[dfg['Type'] == 'Sub']['Qty'].count():
subT = dfg[dfg['Type'] == 'Sub']['CumSum'].iloc[-1]
dfg['Qty'] = np.where((dfg['CumSum'] - subT) <= 0, 0, dfg['Qty'])
dfg = dfg[dfg['Qty'] > 0]
if(len(dfg) > 0):
dfg['Qty'].iloc[0] = dfg['CumSum'].iloc[0] - subT
return dfg
dfFS = df.groupby(['Item'], as_index=False).apply(fruit_stat).drop(['CumSum'], axis=1).reset_index(drop=True)
print(dfFS)
上面的代码产生的答案如下
And the above code produces the answer like this below:
Date Item Type Qty Price
0 11/1/18 Banana Add 20 4
1 10/2/18 Banana Add 10 3
2 12/3/18 Berry Add 25 5
3 18/4/18 Berry Add 15 8
4 19/3/18 Grapes Add 10 5
5 10/3/18 Kiwi Add 80 29
6 08/1/18 Orange Add 30 20
7 18/1/18 Orange Add 10 35
这篇关于Python Pandas Dataframe中行的条件减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!