缺少日期的Pandas Date MultiIndex-滚动总和 [英] Pandas Date MultiIndex with missing dates - Rolling sum

查看:74
本文介绍了缺少日期的Pandas Date MultiIndex-滚动总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的熊猫系列看起来像

Attribute      DateEvent     Value
Type A         2015-04-01    4
               2015-04-02    5
               2015-04-05    3
Type B         2015-04-01    1
               2015-04-03    4
               2015-04-05    1

如何在确保将DateEvent索引中的缺失日期考虑在内的情况下将这些值转换为滚动总和(例如,过去两天)(假设其开始日期和结束日期是完整范围?)(例如,类型A缺少2015-04-032015-04-04,类型B缺少2015-04-022015-04-04.

How do I convert the values to a rolling sum (say, past two days) while making sure to account for missing dates in my DateEvent index (assuming the start date and end date of it are the full range? (For example, 2015-04-03 and 2015-04-04 are missing for Type A, and 2015-04-02 and 2015-04-04 are missing for Type B).

推荐答案

我对您想要的内容做了一些假设,请澄清:

I've made a couple assumptions about what you want, please clarify:

  1. 您希望将缺少日期的行视为具有Value = NaN.
  2. 因此,过去2天滚动总和应该在滚动窗口中缺少日期的任何时候返回NaN.
  3. 您要计算每个组中的滚动总和 Type AType B
  1. You want the rows that have missing dates, to be considered as having Value = NaN.
  2. As a result, the past 2 day rolling sum should return NaN anytime there's a missing date in the rolling window.
  3. You want to compute the rolling sum within in each group Type A and Type B

如果我假设正确,

import pandas as pd
import numpy as np
import io

datastring = io.StringIO(
"""
Attribute,DateEvent,Value
Type A,2017-04-02,1
Type A,2017-04-03,2
Type A,2017-04-04,3
Type A,2017-04-05,4
Type B,2017-04-02,1
Type B,2017-04-03,2
Type B,2017-04-04,3
Type B,2017-04-05,4
""")

s = pd.read_csv(
            datastring, 
            index_col=['Attribute', 'DateEvent'],
            parse_dates=True)
print(s)

这是它的样子. Type AType B都缺少2017-04-01.

Here's what it looks like. Each of Type A and Type B are missing 2017-04-01.

                      Value
Attribute DateEvent        
Type A    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4
Type B    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4

解决方案

根据此答案,您必须重建索引,然后重新索引Series以获得包含所有日期的索引.

Solution

According to this answer, you have to reconstruct the index, then re-index your Series to get one that includes all the dates.

# reconstruct index with all the dates
dates = pd.date_range("2017-04-01","2017-04-05", freq="1D")
attributes = ["Type A", "Type B"]
# create a new MultiIndex
index = pd.MultiIndex.from_product([attributes,dates], 
        names=["Attribute","DateEvent"])
# reindex the series
sNew = s.reindex(index)

添加了缺少的日期,并带有Value = NaN.

The missing dates were added, with Value = NaN.

                      Value
Attribute DateEvent        
Type A    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0
Type B    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0

现在通过Attribute索引列对Series进行分组,并使用sum()

Now group the Series by the Attribute index column and apply a rolling window of size 2 with sum()

# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()

最终输出

                                Value
Attribute Attribute DateEvent        
Type A    Type A    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0
Type B    Type B    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0

最终说明:不知道为什么现在有两个Attribute索引列,请让我知道是否有人指出.

Final Note: No idea why there are now two Attribute index columns, let me know if anyone figures that out.

原来在此处提出了类似的问题.签出来.

Turns out similar question was asked here. Check it out.

来源: 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆