删除Python Pandas的重复方法不起作用 [英] Remove duplicate method for Python Pandas doesnt work

查看:65
本文介绍了删除Python Pandas的重复方法不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试基于"new"列上的唯一值删除重复项,我什至尝试了两种方法,但是输出df.shape建议之前/之后具有相同的df形状,这意味着删除重复项失败.

Trying to remove duplicate based on unique values on column 'new', I have even tried two methods, but the output df.shape suggests before/after have the same df shape, meaning remove duplication fails.

import pandas
import numpy as np
import random

df = pandas.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))

df['new'] = [1, 1, 3, 4, 5, 1, 7, 8, 1, 10]
df['new2'] = [1, 1, 2, 4, 5, 3, 7, 8, 9, 5]

print df.shape

df.drop_duplicates('new', take_last=False)
df.groupby('new').max()

print df.shape

# output
(10, 6)
(10, 6)
[Finished in 1.0s]

推荐答案

您需要分配

You need to assign the result of drop_duplicates, by default inplace=False so it returns a copy of the modified df, as you don't pass param inplace=True your original df is unmodified:

In [106]:

df = df.drop_duplicates('new', take_last=False)
df.groupby('new').max()
Out[106]:
            A         B         C         D  new2
new                                              
1   -1.698741 -0.550839 -0.073692  0.618410     1
3    0.519596  1.686003  1.395585  1.298783     2
4    1.557550  1.249577  0.214546 -0.077569     4
5   -0.183454 -0.789351 -0.374092 -1.824240     5
7   -1.176468  0.546904  0.666383 -0.315945     7
8   -1.224640 -0.650131 -0.394125  0.765916     8
10  -1.045131  0.726485 -0.194906 -0.558927     5

如果您通过了inplace=True,它将起作用:

if you passed inplace=True it would work:

In [108]:

df.drop_duplicates('new', take_last=False, inplace=True)
df.groupby('new').max()
Out[108]:
            A         B         C         D  new2
new                                              
1    0.334352 -0.355528  0.098418 -0.464126     1
3   -0.394350  0.662889 -1.012554 -0.004122     2
4   -0.288626  0.839906  1.335405  0.701339     4
5    0.973462 -0.818985  1.020348 -0.306149     5
7   -0.710495  0.580081  0.251572 -0.855066     7
8   -1.524862 -0.323492 -0.292751  1.395512     8
10  -1.164393  0.455825 -0.483537  1.357744     5

这篇关于删除Python Pandas的重复方法不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆