使用 pandas 在时间序列数据中查找丢失的分钟数据 [英] Find missing minute data in time series data using pandas
问题描述
我有一个时间序列数据,该数据具有每分钟的数据,但是由于传感器中的某些问题,有时无法获取数据并且不记录该分钟数据.我想知道发生在哪一天和哪几小时.我的数据在pandas数据框中.这是我的代码段,存储在数据框中,我想显示未收到的分钟数据.
I have a time series data that has data for each minute, but due to some problems in sensor, sometimes the data is not fetched and that minute data is not recorded. I want to find on which day and on which hour this happened. I have the data in pandas dataframe. This is my code snippet to store in data frame, I would like to display which minute data wasn't received.
l=['Year', 'Month', 'Day', 'Hour', 'Minute']
df = pd.DataFrame(columns=l)
k=0
if __name__ == '__main__':
client = MongoClient("localhost", 27017, maxPoolSize=50)
db=client.test
collection=db['data']
cursor = collection.find({"deviceId":3},{"timestamp":1,"cd":1}).sort("timestamp",-1).limit(1000)
for document in cursor:
for key,value in document.items()[1:-1]:
df.loc[k,'Year']=2017
df.loc[k,'Month']=value.month
df.loc[k,'Day']=value.day
df.loc[k,'Hour']=value.hour
df.loc[k,'Minute']=value.minute
k=k+1
minute_a = pd.Series(np.arange(0, 60))
我当时想使用minute_a
来使用isin
函数,但是由于每小时必须检查一次,所以我不知道该怎么做?
I was thinking to use isin
function using minute_a
but as every hour I have to check, I can't figure out how to do it?
推荐答案
isin 将是实现此目的的好方法.但是,最简单的方法是将传感器时间数据展平为单个DatetimeIndex
,因此您可以将其与参考DatetimeIndex
进行比较.
isin would be good way of doing this. However, the easiest way of doing this would be to flatten your sensor time data into a single DatetimeIndex
so you can compare it with a reference DatetimeIndex
.
# creating reference DatetimeIndex idx_ref with a minute frequency
end=datetime.now().replace(second=0, microsecond=0)
dt = end - timedelta(days=1)
idx_ref = pd.DatetimeIndex(start=dt, end=end,freq='min')
# idx_dat represents your DatetimeIndex from the sensor
gaps = idx_ref[~idx_ref.isin(idx_dat)]
假设您只对时间间隔感兴趣.
Assuming you are only interested in the time gaps of course.
这篇关于使用 pandas 在时间序列数据中查找丢失的分钟数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!