PANDAS-循环两个具有不同大小的日期时间索引以比较日期和值 [英] PANDAS - Loop over two datetime indexes with different sizes to compare days and values
问题描述
寻找一种更有效的方法来遍历和比较两个具有不同频率的Series对象中的datetimeindex值.
Looking for a more efficient way to loop over and compare datetimeindex values in two Series objects with different frequencies.
想象两个熊猫系列,每个熊猫的日期时间索引涵盖相同的年份跨度,但每个索引的频率不同.一个频率为几天,另一个频率为几个小时.
Imagine two Pandas series, each with a datetime index covering the same year span yet with different frequencies for each index. One has a frequency of days, the other a frequency of hours.
range1 = pd.date_range('2016-01-01','2016-12-31', freq='D')
range2 = pd.date_range('2016-01-01','2016-12-31', freq='H')
我正在尝试使用它们的索引来循环查找这些系列以匹配日期,以便我可以比较每天的数据.
I'm trying to loop over these series using their indexes as a lookup to match days so I can compare data for each day.
现在,我正在使用多级for循环和if语句(请参见下文);与我在Pandas操作中惯用的时间相比,完成这些循环的时间似乎过多(每个循环5.45 s).
Right now I'm using multi-level for loops and if statements (see below); the time to complete these loops seems excessive (5.45 s per loop) compared with what I'm used to in Pandas operations.
for date, val in zip(frame1.index, frame1['data']): # freq = 'D'
for date2, val2 in zip(frame2.index, frame2['data']): # freq = 'H'
if date.day == date2.day: # check to see if dates are a match
if val2 > val: # compare the values
# append values, etc
问题
是否有更有效的方法使用第1帧中的索引遍历第2帧中的索引,并比较给定日期每一帧中的值?最终,我想在frame2值大于frame1值的地方创建一系列值.
Question
Is there a more efficient way of using the index in frame1 to loop over the index in frame2 and compare the values in each frame for a given day? Ultimately I want to create a series of values wherever frame2 vals are greater than frame1 vals.
使用随机数据创建两个单独的序列,并为每个序列分配一个日期时间索引.
Create two separate series with random data and assign each a datetime index.
import pandas as pd
import numpy as np
range1 = pd.date_range('2016-01-01','2016-12-31', freq='D')
range2 = pd.date_range('2016-01-01','2016-12-31', freq='H')
frame1 = pd.Series(np.random.rand(366), index=range1)
frame2 = pd.Series(np.random.rand(8761), index=range2)
推荐答案
是的,请使用resample
,asfreq
和pd.concat
.
使用重采样以使序列中的频率正确.
Use resample to get the proper frequency out of your series.
asfreq(听起来有点脏)用于将其重新转换为具有在重采样中定义的频率的序列.
asfreq (which sounds sort of dirty) is use to convert back to a series with frequency defined in resample.
与frame1串联以并排获取值.
Concatenate with frame1 to get values side-by-side.
df = pd.concat([frame1,frame2.resample('1D').asfreq()],axis=1)
df.head()
输出:
0 1
2016-01-01 0.147067 0.235858
2016-01-02 0.820398 0.353275
2016-01-03 0.840499 0.186273
2016-01-04 0.505740 0.340201
2016-01-05 0.547840 0.695041
然后,您可以通过以下方法回到您的第2帧(超出第1帧)的范围.
Then, you can us the following to get back to your series of frame2 exceeding frame1.
df.columns = ['frame1','frame2']
df.query('framed1 < frame2')['frame2']
这篇关于PANDAS-循环两个具有不同大小的日期时间索引以比较日期和值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!