如何drop_duplicates [英] How to drop_duplicates
本文介绍了如何drop_duplicates的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有原始数据,如下例.在时刻t1,变量的值为x1,并且仅当其值不等于x1时,才应在时刻t2记录该变量.有一种方法可以将python中数据框中的值与先前的值进行比较,如果相同,则将其删除.我尝试了关注功能,但不起作用.请帮忙.
I have raw data as following example. At instant t1, a variable has a value x1, this variable should be recorded at instant t2 if and only if its value is not equal to x1. There is a way to compare a value in dataframes in python with the previous value and delete it if it's the same. I tried follow function, but it doesn't work.Please help.
df
time Variable Value
2014-07-11 19:50:20 Var1 10
2014-07-11 19:50:30 Var1 20
2014-07-11 19:50:40 Var1 20
2014-07-11 19:50:50 Var1 30
2014-07-11 19:50:60 Var1 20
2014-07-11 19:50:70 Var2 50
2014-07-11 19:50:80 Var2 60
2014-07-11 19:50:90 Var2 70
编码:
for y in df.time:
for x in df.Value:
if y == y:
if x == x:
df1 = df.drop_duplicates(subset = ['time', 'Variable', 'Value'], keep=False)
else:
df1 = df.drop_duplicates(['time', 'Variable', 'Value'])
预期输出:
df
time Variable Value
2014-07-11 19:50:20 Var1 10
2014-07-11 19:50:30 Var1 20
2014-07-11 19:50:50 Var1 30
2014-07-11 19:50:60 Var1 20
2014-07-11 19:50:70 Var2 50
2014-07-11 19:50:80 Var2 60
2014-07-11 19:50:90 Var2 70
推荐答案
df.drop_duplicates(subset=['Variable','Value'],keep='first')
# time Variable Value
#2014-07-11 19:50:20 Var1 10
#2014-07-11 19:50:30 Var1 20
#2014-07-11 19:50:50 Var2 30
#2014-07-11 19:50:60 Var2 40
#2014-07-11 19:50:70 Var2 50
这篇关于如何drop_duplicates的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文