从Pandas Dataframe删除取消行 [英] Remove cancelling rows from Pandas Dataframe

查看:114
本文介绍了从Pandas Dataframe删除取消行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张发给客户的发票清单.但是,有时会发送不良发票,此发票随后将被取消.我的Pandas Dataframe看起来像这样,除了更大(约300万行)

I have a list of invoices sent out to customers. However, sometimes a bad invoice is sent, which is later cancelled. My Pandas Dataframe looks something like this, except much larger (~3 million rows)

index | customer | invoice_nr | amount | date
---------------------------------------------------
0     | 1        | 1          | 10     | 01-01-2016
1     | 1        | 1          | -10    | 01-01-2016
2     | 1        | 1          | 11     | 01-01-2016
3     | 1        | 2          | 10     | 02-01-2016
4     | 2        | 3          | 7      | 01-01-2016
5     | 2        | 4          | 12     | 02-01-2016
6     | 2        | 4          | 8      | 02-01-2016
7     | 2        | 4          | -12    | 02-01-2016
8     | 2        | 4          | 4      | 02-01-2016
...   | ...      | ...        | ...    | ...
...   | ...      | ...        | ...    | ...

现在,我要删除所有customerinvoice_nrdate相同,但amount具有相反值的行.
发票更正总是在同一天使用相同的发票编号.发票编号唯一地绑定到客户,并且始终对应于一笔交易(该交易可以由多个部分组成,例如customer = 2invoice_nr = 4).仅在更改amount收费或将amount拆分为较小的组件时才进行发票更正.因此,取消的值不会在同一invoice_nr上重复.

Now, I want to drop all rows for which the customer, invoice_nr and date are identical, but the amount has opposite values.
Corrections of invoices always take place on the same day with identical invoice number. The invoice number is uniquely bound to the customer and always corresponds to one transaction (which can consist of multiple components, for example for customer = 2, invoice_nr = 4). Corrections of invoices only occur either to change amount charged, or to split amount in smaller components. Hence, the cancelled value is not repeated on the same invoice_nr.

任何帮助如何对此进行编程的人,将不胜感激.

Any help how to program this would be much appreciated.

推荐答案

def remove_cancelled_transactions(df):
    trans_neg = df.amount < 0
    return df.loc[~(trans_neg | trans_neg.shift(-1))]

groups = [df.customer, df.invoice_nr, df.date, df.amount.abs()]
df.groupby(groups, as_index=False, group_keys=False) \
  .apply(remove_cancelled_transactions)

这篇关于从Pandas Dataframe删除取消行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆