使用 pandas 在时间序列数据中查找丢失的分钟数据 [英] Find missing minute data in time series data using pandas

查看:133
本文介绍了使用 pandas 在时间序列数据中查找丢失的分钟数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列数据,该数据具有每分钟的数据,但是由于传感器中的某些问题,有时无法获取数据并且不记录该分钟数据.我想知道发生在哪一天和哪几小时.我的数据在pandas数据框中.这是我的代码段,存储在数据框中,我想显示未收到的分钟数据.

I have a time series data that has data for each minute, but due to some problems in sensor, sometimes the data is not fetched and that minute data is not recorded. I want to find on which day and on which hour this happened. I have the data in pandas dataframe. This is my code snippet to store in data frame, I would like to display which minute data wasn't received.

l=['Year', 'Month', 'Day', 'Hour', 'Minute']
df = pd.DataFrame(columns=l)
k=0

if __name__ == '__main__':
    client = MongoClient("localhost", 27017, maxPoolSize=50)
    db=client.test
    collection=db['data']
    cursor = collection.find({"deviceId":3},{"timestamp":1,"cd":1}).sort("timestamp",-1).limit(1000)
    for document in cursor:
        for key,value in document.items()[1:-1]:
            df.loc[k,'Year']=2017
            df.loc[k,'Month']=value.month
            df.loc[k,'Day']=value.day
            df.loc[k,'Hour']=value.hour
            df.loc[k,'Minute']=value.minute
            k=k+1
    minute_a = pd.Series(np.arange(0, 60))

我当时想使用minute_a来使用isin函数,但是由于每小时必须检查一次,所以我不知道该怎么做?

I was thinking to use isin function using minute_a but as every hour I have to check, I can't figure out how to do it?

推荐答案

isin 将是实现此目的的好方法.但是,最简单的方法是将传感器时间数据展平为单个DatetimeIndex,因此您可以将其与参考DatetimeIndex进行比较.

isin would be good way of doing this. However, the easiest way of doing this would be to flatten your sensor time data into a single DatetimeIndex so you can compare it with a reference DatetimeIndex.

# creating reference DatetimeIndex idx_ref with a minute frequency
end=datetime.now().replace(second=0, microsecond=0)
dt = end - timedelta(days=1)
idx_ref = pd.DatetimeIndex(start=dt, end=end,freq='min')

# idx_dat represents your DatetimeIndex from the sensor
gaps = idx_ref[~idx_ref.isin(idx_dat)]

假设您只对时间间隔感兴趣.

Assuming you are only interested in the time gaps of course.

这篇关于使用 pandas 在时间序列数据中查找丢失的分钟数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆