从稀疏数据帧填充连续的 pandas 数据帧 [英] Filling continuous pandas dataframe from sparse dataframe

查看:52
本文介绍了从稀疏数据帧填充连续的 pandas 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典名称 date_dict 以日期时间日期为键,其值对应于观察的整数计数.我将其转换为带有删失观察的稀疏系列/数据框,我想加入或转换为具有连续日期的系列/数据框.令人讨厌的列表理解是我绕过熊猫显然不会自动将日期时间日期对象转换为适当的日期时间索引这一事实的技巧.

I have a dictionary name date_dict keyed by datetime dates with values corresponding to integer counts of observations. I convert this to a sparse series/dataframe with censored observations that I would like to join or convert to a series/dataframe with continuous dates. The nasty list comprehension is my hack to get around the fact that pandas apparently won't automatically covert datetime date objects to an appropriate DateTime index.

df1 = pd.DataFrame(data=date_dict.values(),
                   index=[datetime.datetime.combine(i, datetime.time()) 
                          for i in date_dict.keys()],
                   columns=['Name'])
df1 = df1.sort(axis=0)

此示例有 1258 个观测值,DateTime 索引从 2003-06-24 运行到 2012-11-07.

This example has 1258 observations and the DateTime index runs from 2003-06-24 to 2012-11-07.

df1.head()
             Name
Date
2003-06-24   2
2003-08-13   1
2003-08-19   2
2003-08-22   1
2003-08-24   5

我可以创建一个带有连续 DateTime 索引的空数据框,但这会引入一个不需要的列并且看起来很笨重.我觉得好像我缺少一个更优雅的解决方案,包括连接.

I can create an empty dataframe with a continuous DateTime index, but this introduces an unneeded column and seems clunky. I feel as though I'm missing a more elegant solution involving a join.

df2 = pd.DataFrame(data=None,columns=['Empty'],
                   index=pd.DateRange(min(date_dict.keys()),
                                      max(date_dict.keys())))
df3 = df1.join(df2,how='right')
df3.head()
            Name    Empty
2003-06-24   2   NaN
2003-06-25  NaN  NaN
2003-06-26  NaN  NaN
2003-06-27  NaN  NaN
2003-06-30  NaN  NaN

是否有更简单或更优雅的方法从稀疏数据帧填充连续数据帧,以便有(1)连续索引,(2)NaN 为 0,以及(3)没有剩余的空数据框中的列?

Is there a simpler or more elegant way to fill a continuous dataframe from a sparse dataframe so that there is (1) a continuous index, (2) the NaNs are 0s, and (3) there is no left-over empty column in the dataframe?

            Name
2003-06-24   2
2003-06-25   0
2003-06-26   0
2003-06-27   0
2003-06-30   0

推荐答案

您可以使用日期范围对时间序列使用 reindex.此外,看起来您最好使用 TimeSeries 而不是 DataFrame(请参阅 documentation),尽管重新索引也是将缺失的索引值添加到 DataFrame 的正确方法.

You can just use reindex on a time series using your date range. Also it looks like you would be better off using a TimeSeries instead of a DataFrame (see documentation), although reindexing is also the correct method for adding missing index values to DataFrames as well.

例如,开头:

date_index = pd.DatetimeIndex([pd.datetime(2003,6,24), pd.datetime(2003,8,13),
        pd.datetime(2003,8,19), pd.datetime(2003,8,22), pd.datetime(2003,8,24)])

ts = pd.Series([2,1,2,1,5], index=date_index)

为您提供类似于示例数据帧头部的时间序列:

Gives you a time series like your example dataframe's head:

2003-06-24    2
2003-08-13    1
2003-08-19    2
2003-08-22    1
2003-08-24    5

简单的做

ts.reindex(pd.date_range(min(date_index), max(date_index)))

然后给你一个完整的索引,用 NaN 来表示你的缺失值(如果你想用其他一些值填充缺失值,你可以使用 fillna - 参见 此处):

then gives you a complete index, with NaNs for your missing values (you can use fillna if you want to fill the missing values with some other values - see here):

2003-06-24     2
2003-06-25   NaN
2003-06-26   NaN
2003-06-27   NaN
2003-06-28   NaN
2003-06-29   NaN
2003-06-30   NaN
2003-07-01   NaN
2003-07-02   NaN
2003-07-03   NaN
2003-07-04   NaN
2003-07-05   NaN
2003-07-06   NaN
2003-07-07   NaN
2003-07-08   NaN
2003-07-09   NaN
2003-07-10   NaN
2003-07-11   NaN
2003-07-12   NaN
2003-07-13   NaN
2003-07-14   NaN
2003-07-15   NaN
2003-07-16   NaN
2003-07-17   NaN
2003-07-18   NaN
2003-07-19   NaN
2003-07-20   NaN
2003-07-21   NaN
2003-07-22   NaN
2003-07-23   NaN
2003-07-24   NaN
2003-07-25   NaN
2003-07-26   NaN
2003-07-27   NaN
2003-07-28   NaN
2003-07-29   NaN
2003-07-30   NaN
2003-07-31   NaN
2003-08-01   NaN
2003-08-02   NaN
2003-08-03   NaN
2003-08-04   NaN
2003-08-05   NaN
2003-08-06   NaN
2003-08-07   NaN
2003-08-08   NaN
2003-08-09   NaN
2003-08-10   NaN
2003-08-11   NaN
2003-08-12   NaN
2003-08-13     1
2003-08-14   NaN
2003-08-15   NaN
2003-08-16   NaN
2003-08-17   NaN
2003-08-18   NaN
2003-08-19     2
2003-08-20   NaN
2003-08-21   NaN
2003-08-22     1
2003-08-23   NaN
2003-08-24     5
Freq: D, Length: 62

这篇关于从稀疏数据帧填充连续的 pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆