为 pandas 设置差异 [英] set difference for pandas

查看:59
本文介绍了为 pandas 设置差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个简单的熊猫问题:

是否有drop_duplicates()功能可以删除重复中涉及的每一行?

Is there a drop_duplicates() functionality to drop every row involved in the duplication?

以下是一个等效的问题:熊猫在数据帧方面是否有固定的差异?

An equivalent question is the following: Does pandas have a set difference for dataframes?

例如:

In [5]: df1 = pd.DataFrame({'col1':[1,2,3], 'col2':[2,3,4]})

In [6]: df2 = pd.DataFrame({'col1':[4,2,5], 'col2':[6,3,5]})

In [7]: df1
Out[7]: 
   col1  col2
0     1     2
1     2     3
2     3     4

In [8]: df2
Out[8]: 
   col1  col2
0     4     6
1     2     3
2     5     5

所以也许像df2.set_diff(df1)这样的东西会产生这种情况:

so maybe something like df2.set_diff(df1) will produce this:

   col1  col2
0     4     6
2     5     5

但是,我不想依赖索引,因为在我的情况下,我必须处理具有不同索引的数据框.

However, I don't want to rely on indexes because in my case, I have to deal with dataframes that have distinct indexes.

顺便说一句,我最初考虑了当前drop_duplicates()方法的扩展,但是现在我意识到,使用集合论属性的第二种方法通常更有用.但这两种方法都能解决我当前的问题.

By the way, I initially thought about an extension of the current drop_duplicates() method, but now I realize that the second approach using properties of set theory would be far more useful in general. Both approaches solve my current problem, though.

谢谢!

推荐答案

from pandas import  DataFrame

df1 = DataFrame({'col1':[1,2,3], 'col2':[2,3,4]})
df2 = DataFrame({'col1':[4,2,5], 'col2':[6,3,5]})


print(df2[~df2.isin(df1).all(1)])
print(df2[(df2!=df1)].dropna(how='all'))
print(df2[~(df2==df1)].dropna(how='all'))

这篇关于为 pandas 设置差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆