根据数据框中的条件消除重复 [英] Eliminating Duplicates based on Conditions in Data Frame

查看:55
本文介绍了根据数据框中的条件消除重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的数据框:

Fruits         Person        Eat

Banana         Peter         Yes 
Banana         Ashley        Yes
Strawberry     Peter         No
Strawberry     Ashley        Yes 
Cherry         Peter         Yes
Orange         Peter         No
Orange         Ashley        No
Grape          Ashley        Yes
Pear           Ashley        Yes
Pear           Peter         Yes

我的数据框中有重复的水果.我需要根据以下逻辑删除重复项.如果有重复的水果,而Peter和Ashley都吃了,则保留Peter的行,并删除Ashley的行.如果有重复的水果而Peter不吃而Ashley吃了,那么Peter的行将被删除,Ashley的行将保留.如果有重复的水果并且Peter不吃而Ashley不吃,则两行都将被删除.

There are duplicate fruits in my data frame. I need to delete the duplicates based on the following logic. If there is a duplicate fruit and Peter and Ashley both eat it, then Peter's row is kept and Ashley's row is deleted. If there is a duplicate fruit and Peter doesn't eat it and Ashley eats it, then Peter's row is deleted and Ashley's row remains. If there is a duplicate fruit and Peter doesn't eat it and Ashley doesn't eat it, then both rows are deleted.

采用这种逻辑,数据帧应输出为:

With this logic the data frame should output like:

Fruits         Person        Eat

Banana         Peter         Yes 
Strawberry     Ashley        Yes 
Cherry         Peter         Yes
Grape          Ashley        Yes
Pear           Peter         Yes

我不确定如何在这些条件下遍历熊猫数据框以删除重复项.通常,对于第一个条件,我会执行以下操作:

I'm not sure how to iterate through a pandas data frame with these conditions to delete duplicates. Generally, for the first condition I would do something like this:

data = [
    {
        "fruit": "Apple",
        "person": "Ashley",
        "eats": True
    },
    {
        "fruit": "Apple",
        "person": "Peter",
        "eats": True
    }
]
eats = dict()

for i, row in enumerate(data):
    fruit = row["fruit"]
person = row["person"]
does_eat = row["eats"]
# mark whether person eats fruit
if not eats.get(person):
    eats[person] = dict()

# if person does eat, record row number for later deletion if needed if does_eat:
eats[person][fruit] = i

# dedup
if person == "Peter" and eats.get("Peter") and eats["Peter"].get(fruit):
    data.pop(eats["Ashley"][fruit])
elif person == "Ashley" and eats.get("Peter") and eats["Peter"].get(fruit):
    data.pop(i)

任何有关如何使用我的数据框执行此操作的帮助/提示,​​将不胜感激.

Any help/tips on how to do this with my data frame would be very appreciated.

推荐答案

尝试一下:

df1 = (df[df.Eat.eq('Yes')].sort_values('Person')
                           .drop_duplicates(subset='Fruits', keep='last'))

Out[14]:
       Fruits  Person  Eat
3  Strawberry  Ashley  Yes
7       Grape  Ashley  Yes
0      Banana   Peter  Yes
4      Cherry   Peter  Yes
9        Pear   Peter  Yes

这篇关于根据数据框中的条件消除重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆