pandas 重复采样重复数据时间的组 [英] Pandas resample by groups with duplicate datetimes

查看:111
本文介绍了 pandas 重复采样重复数据时间的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有很多类似的问题,但是我找不到有相同日期时间的观察结果。最小的非工作示例是:

Lots of similar questions on here, but I couldn't find any that actually had observations with the same datetime. A minimum non-working example would be:

df = pd.DataFrame(
    {"Date": np.tile([pd.Series(["2016-01", "2016-03"])], 2)[0],
     "Group": [1,1,2,2],
     "Obs":[1,2,5,6]})

现在我想按组进行线性内插2016年2月的值,因此所需的输出为

Now I'd like to linearly interpolate the value for February 2016 by group, so the required output is

    Date    Group   Obs
    2016-01     1       1
    2016-02     1     1.5
    2016-03     1       2
    2016-01     2       5
    2016-02     2     5.5
    2016-03     2       6

我的理解是, resample 应该能够这样做(在我的实际应用中,我试图从季度转到每月,所以在1月和4月有观察),但这需要某种时间索引,我不能做,因为在日期列。

My understanding is that resample should be able to do this (in my actual application I'm trying to move from quarterly to monthly, so have observations in Jan and Apr), but that requires some sort of time index, which I can't do as there are duplicates in the Date column.

我假设某种 groupby magic c可以帮助,但不能弄清楚!

I'm assuming some sort of groupby magic could help, but can't figure it out!

推荐答案

修改:替换 resample with reindex 提高2x速度。

Edit: replaced resample with reindex for a 2x speed improvement.

df.set_index('Date', inplace=True)
index = ['2016-01', '2016-02', '2016-03']

df.groupby('Group').apply(lambda df1: df1.reindex(index).interpolate())

使用 groupby 很容易,一旦你明白它只是返回一个数据框(这里 df1 )每个值在分组列。

Using groupby is easy once you understand it just returns one dataframe (here df1) per value in the grouping column.

这篇关于 pandas 重复采样重复数据时间的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆