pandas :删除所有重复索引的记录 [英] Pandas: Drop all records of duplicate indices

查看:135
本文介绍了 pandas :删除所有重复索引的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集可能重复的记录标识符 appkey 。重复的记录应该理想地不存在,因此我把它们作为数据收集错误。我需要删除一次 appkey 的所有实例。

I have a dataset with potentially duplicate records of the identifier appkey. The duplicated records should ideally not exist and therefore I take them to be data collection mistakes. I need to drop all instances of an appkey which occurs more than once.

drop_duplicates 方法在这种情况下(或是吗?)是无用的,因为它可以选择第一个或最后一个重复项。 / p>

The drop_duplicates method is not useful in this case (or is it?) as it either selects the first or the last of the duplicates. Is there any obvious idiom to achieve this with pandas?

推荐答案

根据熊猫版本0.12,我们有过滤器为此。它确实是@ Andy的解决方案使用变换,但更简洁,更快一些。

As of pandas version 0.12, we have filter for this. It does exactly what @Andy's solution does using transform, but a little more succinctly and somewhat faster.

df.groupby('AppKey').filter(lambda x: x.count() == 1)

为了窃取@ Andy的例子,

To steal @Andy's example,

In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['AppKey', 'B'])

In [2]: df.groupby('AppKey').filter(lambda x: x.count() == 1)
Out[2]: 
   AppKey  B
2       5  6

这篇关于 pandas :删除所有重复索引的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆