groupby并删除 pandas 中的配对记录 [英] groupby and remove pair records in pandas
本文介绍了groupby并删除 pandas 中的配对记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个这样的数据框,
I have a dataframe like this,
col1 col2 col3 col4
a1 b1 c1 +
a1 b1 c1 +
a1 b2 c2 +
a1 b2 c2 -
a1 b2 c2 +
如果在col1
,col2
和col3
中有两个记录具有相同的值,而在col4
中具有相反的符号,则应将它们从数据框中删除.
If there two records with identical values in col1
,col2
and col3
and opposite sign in col4
, they should be removed from dataframe.
输出:
col1 col2 col3 col4
a1 b1 c1 +
a1 b1 c1 +
a1 b2 c2 +
到目前为止,我尝试了熊猫duplicated
和groupby
,但是没有成功找到对.该怎么做?
So far I tried pandas duplicated
and groupby
but didn't succeeded with finding pairs. How to do this ?
推荐答案
I think need cumcount
for count groups define all 4
columns and then groupby again with helper Series
define +-
groups and compare with set
:
s = df.groupby(['col1','col2','col3', 'col4']).cumcount()
df = df[~df.groupby(['col1','col2','col3', s])['col4']
.transform(lambda x: set(x) == set(['+','-']))]
print (df)
col1 col2 col3 col4
0 a1 b1 c1 +
1 a1 b1 c1 +
6 a1 b2 c2 +
为了更好地理解,请创建新列:
For better understanding create new column:
df['help'] = df.groupby(['col1','col2','col3', 'col4']).cumcount()
print (df)
col1 col2 col3 col4 help
0 a1 b1 c1 + 0
1 a1 b1 c1 + 1
2 a1 b2 c2 + 0
3 a1 b2 c2 - 0
4 a1 b2 c2 + 1
df = df[~df.groupby(['col1','col2','col3', 'help'])['col4']
.transform(lambda x: set(x) == set(['+','-']))]
print (df)
col1 col2 col3 col4 help
0 a1 b1 c1 + 0
1 a1 b1 c1 + 1
4 a1 b2 c2 + 1
这篇关于groupby并删除 pandas 中的配对记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文