Python Pandas Dataframe中行的条件减法 [英] Conditional Subtraction of rows in Python Pandas Dataframe

查看:494
本文介绍了Python Pandas Dataframe中行的条件减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决所说明的问题。我有一个数据框,如下所示:

I am trying to solve a problem at hand as explained. I have a Dataframe as shown below:

Date    Item    Type    Qty Price
1/1/18  Orange  Add     100 25
5/1/18  Orange  Add     20  40
8/1/18  Orange  Add     40  20
18/1/18 Orange  Add     10  35
27/2/18 Orange  Sub     100 55
15/4/18 Orange  Sub     30  45

,我想获得中间数据帧如下:

and I want to get the intermediate Dataframe like below:

Date    Item    Type    Qty Price   Diff
1/1/18  Orange  Add     0   25      30
5/1/18  Orange  Add     0   40      5
8/1/18  Orange  Add     30  20      25
18/1/18 Orange  Add     10  35

,然后是我想要的最终数据框,如下所示:

and then the final Dataframe I want it like this below:

Date    Item    Type    Qty Price
8/1/18  Orange  Add     30  20
18/1/18 Orange  Add     10  35

注意:Diff是买价和买价之差。数量也会用添加数量减去子数量进行更新。

NOTE: Diff is a difference of Sub and Add Price. And Qty is also updated with Qty of Sub subtracted from Qty of Add.

请问你们中的任何人都可以帮助实现它。我正在尝试使用groupby,应用和转换,但是直到现在我还没有得到这个。

Could anyone of you please help with the way it can be achieved. I was trying with groupby, apply and transform but till now I have not got this.

我有下面的代码,仍在开发中,尚未完成:

I have below code, still in development and not complete:

def FruitSummary():
    df = pd.DataFrame([
               ['01/1/18',   'Orange',   'Add',  100,    25],
               ['05/1/18',   'Orange',   'Add',   20,    40],
               ['08/1/18',   'Orange',   'Add',   40,    20],
               ['18/1/18',   'Orange',   'Add',   10,    35],
               ['27/2/18',   'Orange',   'Sub',  100,    55],
               ['15/4/18',   'Orange',   'Sub',   30,    45],
               ['02/1/18',   'Banana',   'Add',  110,     7],
               ['04/1/18',   'Banana',   'Add',   20,     9],
               ['11/1/18',   'Banana',   'Add',   40,     4],
               ['10/2/18',   'Banana',   'Add',   10,     3],
               ['15/3/18',   'Banana',   'Sub',  100,     9],
               ['15/4/18',   'Banana',   'Sub',   50,     8],
               ['10/3/18',   'Kiwi',     'Add',   80,    29],
               ['12/3/18',   'Berry',    'Add',   25,     5],
               ['18/4/18',   'Berry',    'Add',   15,     8]],
       columns=['Date',      'Item',     'Type', 'Qty',  'Price'])

    print(df)

    def fruit_stat(dfIN):
        print(dfIN)
        print((dfIN['Type'] == 'Sub').unique(), (dfIN['Type'] == 'ODD').unique())

        if len(dfIN) > 1 and (True in (dfIN['Type'] == 'Sub').unique()):
            print(dfIN['Item'].iloc[1], "'len > 1'", "'Sub True'")

dfFS = df.groupby(['Item']).apply(fruit_stat)
print(dfFS)


推荐答案

我能够找到一些解决方案,不确定是最优的还是更好的解决方案

I am able to find some solution, not sure if it is optimal or there might be better solution for the same.

df = pd.DataFrame([['01/1/18',   'Orange',   'Add',  100,    25],
                   ['05/1/18',   'Orange',   'Add',   20,    40],
                   ['08/1/18',   'Orange',   'Add',   40,    20],
                   ['18/1/18',   'Orange',   'Add',   10,    35],
                   ['27/2/18',   'Orange',   'Sub',  100,    55],
                   ['15/4/18',   'Orange',   'Sub',   30,    45],
                   ['02/1/18',   'Banana',   'Add',  110,     7],
                   ['04/1/18',   'Banana',   'Add',   20,     9],
                   ['11/1/18',   'Banana',   'Add',   40,     4],
                   ['10/2/18',   'Banana',   'Add',   10,     3],
                   ['15/3/18',   'Banana',   'Sub',  100,     9],
                   ['15/4/18',   'Banana',   'Sub',   50,     8],
                   ['10/3/18',   'Kiwi',     'Add',   80,    29],
                   ['12/3/18',   'Berry',    'Add',   25,     5],
                   ['18/4/18',   'Berry',    'Add',   15,     8],
                   ['16/3/18',   'Cherry',   'Add',   25,     5],
                   ['21/4/18',   'Cherry',   'Sub',   25,     8],
                   ['19/3/18',   'Grapes',   'Add',   25,     5],
                   ['23/4/18',   'Grapes',   'Sub',   15,     8]],
          columns=['Date',      'Item',     'Type', 'Qty',  'Price'])


def FruitSummary(df):
    df['CumSum'] = df.groupby(['Item', 'Type'])['Qty'].cumsum()
    print(df)

    def fruit_stat(dfg):
        if dfg[dfg['Type'] == 'Sub']['Qty'].count():
            subT = dfg[dfg['Type'] == 'Sub']['CumSum'].iloc[-1]
            dfg['Qty'] = np.where((dfg['CumSum'] - subT) <= 0, 0, dfg['Qty'])
            dfg = dfg[dfg['Qty'] > 0]
            if(len(dfg) > 0):
                dfg['Qty'].iloc[0] = dfg['CumSum'].iloc[0] - subT

        return dfg

    dfFS = df.groupby(['Item'], as_index=False).apply(fruit_stat).drop(['CumSum'], axis=1).reset_index(drop=True)
    print(dfFS)

上面的代码产生的答案如下

And the above code produces the answer like this below:

      Date    Item Type  Qty  Price
0  11/1/18  Banana  Add   20      4
1  10/2/18  Banana  Add   10      3
2  12/3/18   Berry  Add   25      5
3  18/4/18   Berry  Add   15      8
4  19/3/18  Grapes  Add   10      5
5  10/3/18    Kiwi  Add   80     29
6  08/1/18  Orange  Add   30     20
7  18/1/18  Orange  Add   10     35

这篇关于Python Pandas Dataframe中行的条件减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆