DataFrame.drop_duplicates和DataFrame.drop不删除行 [英] DataFrame.drop_duplicates and DataFrame.drop not removing rows
问题描述
我已经将csv读入了pandas数据框,它有五列.某些行仅在第二列中具有重复值,我想从数据框中删除这些行,但drop或drop_duplicates都不起作用.
I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the dataframe but neither drop nor drop_duplicates is working.
这是我的实现方式
#Read CSV
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e'])
print Series(df.b)
dropRows = []
#Sanitize the data to get rid of duplicates
for indx, val in enumerate(df.b): #for all the values
if(indx == 0): #skip first indx
continue
if (val == df.b[indx-1]): #this is duplicate rtc value
dropRows.append(indx)
print dropRows
df.drop(dropRows) #this doesnt work
df.drop_duplicates('b') #this doesnt work either
print Series(df.b)
当我打印出系列df.b之前和之后的长度相同时,我仍然可以清楚地看到重复项.我的实现中有什么问题吗?
when i print out the series df.b before and after they are the same length and I can visibly see the duplicates still. is there something wrong in my implementation?
推荐答案
如注释中所述,除非提供了inplace参数,否则drop
和drop_duplicates
会创建一个新的DataFrame.所有这些选项都将起作用:
As mentioned in the comments, drop
and drop_duplicates
creates a new DataFrame, unless provided with an inplace argument. All these options would work:
df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)
这篇关于DataFrame.drop_duplicates和DataFrame.drop不删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!