python pandas重新采样原始数据中不存在的添加日期 [英] Python pandas resample added dates not present in the original data
问题描述
我正在使用熊猫将存储在data_m
中的日内数据转换为每日数据.由于某些原因,resample
添加了日内数据中不存在的日期行.例如,日内数据中没有1/8/2000,但是日数据中包含该日期的行,且以NaN作为值. DatetimeIndex具有比实际数据更多的条目.我做错什么了吗?
I am using pandas to convert intraday data, stored in data_m
, to daily data. For some reason resample
added rows for days that were not present in the intraday data. For example, 1/8/2000 is not in the intraday data, yet the daily data contains a row for that date with NaN as the value. DatetimeIndex has more entries than the actual data. Am I doing anything wrong?
data_m.resample('D', how = mean).head()
Out[13]:
x
2000-01-04 8803.879581
2000-01-05 8765.036649
2000-01-06 8893.156250
2000-01-07 8780.037433
2000-01-08 NaN
data_m.resample('D', how = mean)
Out[14]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4729 entries, 2000-01-04 00:00:00 to 2012-12-14 00:00:00
Freq: D
Data columns:
x 3241 non-null values
dtypes: float64(1)
推荐答案
您所做的看起来很正确,只是熊猫给NaN表示空数组的平均值.
What you are doing looks correct, it's just that pandas gives NaN for the mean of an empty array.
In [1]: Series().mean()
Out[1]: nan
resample
转换为常规时间间隔,因此,如果当天没有样品,您将得到NaN.
resample
converts to a regular time interval, so if there are no samples that day you get NaN.
大多数情况下,使用NaN并不是问题.如果是这样,我们可以使用fill_method
(例如'ffill'
),或者如果您确实要删除它们,则可以使用dropna
(不推荐):
Most of the time having NaN isn't a problem. If it is we can either use fill_method
(for example 'ffill'
) or if you really wanted to remove them you could use dropna
(not recommended):
data_m.resample('D', how = mean, fill_method='ffill')
data_m.resample('D', how = mean).dropna()
更新:现代的等效词似乎是:
Update: The modern equivalent seems to be:
In [21]: s.resample("D").mean().ffill()
Out[21]:
x
2000-01-04 8803.879581
2000-01-05 8765.036649
2000-01-06 8893.156250
2000-01-07 8780.037433
2000-01-08 8780.037433
In [22]: s.resample("D").mean().dropna()
Out[22]:
x
2000-01-04 8803.879581
2000-01-05 8765.036649
2000-01-06 8893.156250
2000-01-07 8780.037433
请参见重新采样文档.
这篇关于python pandas重新采样原始数据中不存在的添加日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!