groupby并删除 pandas 中的配对记录 [英] groupby and remove pair records in pandas

查看:50
本文介绍了groupby并删除 pandas 中的配对记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框,

I have a dataframe like this,

col1    col2    col3    col4
a1      b1      c1      +
a1      b1      c1      +
a1      b2      c2      +
a1      b2      c2      -
a1      b2      c2      +

如果在col1col2col3中有两个记录具有相同的值,而在col4中具有相反的符号,则应将它们从数据框中删除.

If there two records with identical values in col1,col2 and col3 and opposite sign in col4, they should be removed from dataframe.

输出:

col1    col2    col3    col4
a1      b1      c1      +
a1      b1      c1      +
a1      b2      c2      +

到目前为止,我尝试了熊猫duplicatedgroupby,但是没有成功找到对.该怎么做?

So far I tried pandas duplicated and groupby but didn't succeeded with finding pairs. How to do this ?

推荐答案

我认为需要

I think need cumcount for count groups define all 4 columns and then groupby again with helper Series define +- groups and compare with set:

s = df.groupby(['col1','col2','col3', 'col4']).cumcount()
df = df[~df.groupby(['col1','col2','col3', s])['col4']
           .transform(lambda x: set(x) == set(['+','-']))]
print (df)
  col1 col2 col3 col4
0   a1   b1   c1    +
1   a1   b1   c1    +
6   a1   b2   c2    +

为了更好地理解,请创建新列:

For better understanding create new column:

df['help'] = df.groupby(['col1','col2','col3', 'col4']).cumcount()
print (df)
  col1 col2 col3 col4  help
0   a1   b1   c1    +     0
1   a1   b1   c1    +     1
2   a1   b2   c2    +     0
3   a1   b2   c2    -     0
4   a1   b2   c2    +     1

df = df[~df.groupby(['col1','col2','col3', 'help'])['col4']
           .transform(lambda x: set(x) == set(['+','-']))]
print (df)
  col1 col2 col3 col4  help
0   a1   b1   c1    +     0
1   a1   b1   c1    +     1
4   a1   b2   c2    +     1

这篇关于groupby并删除 pandas 中的配对记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆