DataFrame.drop_duplicates和DataFrame.drop不删除行 [英] DataFrame.drop_duplicates and DataFrame.drop not removing rows

查看:254
本文介绍了DataFrame.drop_duplicates和DataFrame.drop不删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经将csv读入了pandas数据框,它有五列.某些行仅在第二列中具有重复值,我想从数据框中删除这些行,但drop或drop_duplicates都不起作用.

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the dataframe but neither drop nor drop_duplicates is working.

这是我的实现方式

#Read CSV
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e'])

print Series(df.b)

dropRows = []
#Sanitize the data to get rid of duplicates
for indx, val in enumerate(df.b): #for all the values
    if(indx == 0): #skip first indx
        continue

    if (val == df.b[indx-1]): #this is duplicate rtc value
        dropRows.append(indx)

print dropRows

df.drop(dropRows) #this doesnt work
df.drop_duplicates('b') #this doesnt work either

print Series(df.b)

当我打印出系列df.b之前和之后的长度相同时,我仍然可以清楚地看到重复项.我的实现中有什么问题吗?

when i print out the series df.b before and after they are the same length and I can visibly see the duplicates still. is there something wrong in my implementation?

推荐答案

如注释中所述,除非提供了inplace参数,否则dropdrop_duplicates会创建一个新的DataFrame.所有这些选项都将起作用:

As mentioned in the comments, drop and drop_duplicates creates a new DataFrame, unless provided with an inplace argument. All these options would work:

df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)

这篇关于DataFrame.drop_duplicates和DataFrame.drop不删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆