根据其他行中的值删除行 [英] Deleting rows based on values in other rows

查看：62 发布时间：2020/10/16 22:34:27 python python-3.x pandas dataframe pandas-groupby

本文介绍了根据其他行中的值删除行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在寻找一种根据要检查的条件从另一行中删除数据行的方法。

I was looking for a way to drop rows from my dataframe based on conditions to be checked with values in another row.

这是我的数据框：

product product_id  account_status
prod-A  100         active
prod-A  100         cancelled
prod-A  300         active
prod-A  400         cancelled

如果存在具有account_status ='active'的行产品和和product_id组合，然后保留该行并删除其他行。

If a row with account_status='active' exists for a product & and product_id combination, then retain this row and delete other rows.

所需的输出为：

product product_id  account_status
prod-A  100         active
prod-A  300         active
prod-A  400         cancelled

我看到提到的解决方案此处，但无法将其复制为字符串。

I saw the solution mentioned here but couldn't replicate it for strings.

请提出建议。

推荐答案

对于更通用的解决方案，如果每个组至少存在一个活动 account_status 值$ c>的值：

For more general solution removing only another account_status values per groups if exist at least one active value there:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
2  prod-A         300         active
3  prod-A         400      cancelled
4  prod-A         500         active
5  prod-A         500         active
6  prod-A         600      cancelled
7  prod-A         600      cancelled

也可以很好地与多个类别配合使用：

Also working nice with multiple categories:

print (df)
  product  product_id account_status
0  prod-A         100         active
1  prod-A         100      cancelled <- necessary remove
2  prod-A         100        pending <- necessary remove
3  prod-A         300         active
4  prod-A         300        pending <- necessary remove
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled

s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
  product  product_id account_status
0  prod-A         100         active
3  prod-A         300         active
5  prod-A         400      cancelled
6  prod-A         500         active
7  prod-A         500         active
8  prod-A         600        pending
9  prod-A         600      cancelled

这篇关于根据其他行中的值删除行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据其他行中的值删除行 [英] Deleting rows based on values in other rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据其他行中的值删除行 [英] Deleting rows based on values in other rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭