如何drop_duplicates [英] How to drop_duplicates

查看:223
本文介绍了如何drop_duplicates的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有原始数据,如下例.在时刻t1,变量的值为x1,并且仅当其值不等于x1时,才应在时刻t2记录该变量.有一种方法可以将python中数据框中的值与先前的值进行比较,如果相同,则将其删除.我尝试了关注功能,但不起作用.请帮忙.

I have raw data as following example. At instant t1, a variable has a value x1, this variable should be recorded at instant t2 if and only if its value is not equal to x1. There is a way to compare a value in dataframes in python with the previous value and delete it if it's the same. I tried follow function, but it doesn't work.Please help.

df
time                 Variable   Value
2014-07-11 19:50:20  Var1       10
2014-07-11 19:50:30  Var1       20
2014-07-11 19:50:40  Var1       20
2014-07-11 19:50:50  Var1       30
2014-07-11 19:50:60  Var1       20 
2014-07-11 19:50:70  Var2       50
2014-07-11 19:50:80  Var2       60
2014-07-11 19:50:90  Var2       70

编码:

for y in df.time:
    for x in df.Value:
        if y == y:
            if x == x:
                df1 = df.drop_duplicates(subset = ['time', 'Variable', 'Value'], keep=False) 
            else:
                df1 = df.drop_duplicates(['time', 'Variable', 'Value'])

预期输出:

df
time                 Variable   Value
2014-07-11 19:50:20  Var1       10
2014-07-11 19:50:30  Var1       20
2014-07-11 19:50:50  Var1       30
2014-07-11 19:50:60  Var1       20 
2014-07-11 19:50:70  Var2       50
2014-07-11 19:50:80  Var2       60
2014-07-11 19:50:90  Var2       70

推荐答案

df.drop_duplicates(subset=['Variable','Value'],keep='first')
#                time Variable  Value
#2014-07-11  19:50:20     Var1     10
#2014-07-11  19:50:30     Var1     20
#2014-07-11  19:50:50     Var2     30
#2014-07-11  19:50:60     Var2     40
#2014-07-11  19:50:70     Var2     50

这篇关于如何drop_duplicates的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆