根据其他行中的值删除行 [英] Deleting rows based on values in other rows

查看:62
本文介绍了根据其他行中的值删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找一种根据要检查的条件从另一行中删除数据行的方法。

I was looking for a way to drop rows from my dataframe based on conditions to be checked with values in another row.

这是我的数据框:

product product_id  account_status
prod-A  100         active
prod-A  100         cancelled
prod-A  300         active
prod-A  400         cancelled

如果存在具有account_status ='active'的行产品和和product_id组合,然后保留该行并删除其他行。

If a row with account_status='active' exists for a product & and product_id combination, then retain this row and delete other rows.

所需的输出为:

product product_id  account_status
prod-A  100         active
prod-A  300         active
prod-A  400         cancelled

我看到提到的解决方案此处,但无法将其复制为字符串。

I saw the solution mentioned here but couldn't replicate it for strings.

请提出建议。

推荐答案

对于更通用的解决方案,如果每个组至少存在一个活动 account_status 值$ c>的值:

For more general solution removing only another account_status values per groups if exist at least one active value there:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

也可以很好地与多个类别配合使用:

Also working nice with multiple categories:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         100        pending <- necessary remove
3  prod-A         300         active
4  prod-A         300        pending <- necessary remove
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
3  prod-A         300         active
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled

这篇关于根据其他行中的值删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆