用索引+列在 pandas 中取样数据框 [英] Upsampling Dataframe in Pandas with Index + Column

查看:219
本文介绍了用索引+列在 pandas 中取样数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于按月索引的数据框,我希望按日索引(上采样)。以前按月编制的值现在应该除以当月的天数。除了索引之外,还应该在分组中使用一列。与相似,只是在分组中也使用了一列。

Given a dataframe indexed by month, I'd like to reindex by day (upsample). Values that were previously indexed by month should now be divided by the number of days in the month. In addition to the index, a column should be used in the grouping. Similar to this - just with a column also being used in the grouping.

import pandas as pd
import numpy as np

np.random.seed(1234)
tidx_m = pd.date_range('2011-01-31', periods=5, freq='M')
df = pd.DataFrame(np.random.randint(0, 2, (5, 2)), columns=['class', 'val'])
df.index = tidx_m
df = pd.concat([df, df])
df.ix[:5, 'class'] = 0
df.ix[5:, 'class'] = 1
print(df)

            class  val
2011-01-31      0    1
2011-02-28      0    1
2011-03-31      0    0
2011-04-30      0    1
2011-05-31      0    1
2011-01-31      1    1
2011-02-28      1    1
2011-03-31      1    0
2011-04-30      1    1
2011-05-31      1    1

将索引升级到天而不是几个月后,我想按日期时间指数 class 。 val中的值应在本月的所有日期重新分配(例如,1月份的每一天为1/31)。

After upsampling the index to days instead of months, I'd like to group by the Datetime index and class. Values in "val" should be redistributed throughout all days in the month (e.g. 1 becomes 1 / 31 for each day in January).

推荐答案

首先必须向 DataFrame 添加新行,其值与第一行和索引值相同,仅与第一个月的日期相匹配。

First is necessary add new row to DataFrame with values of first row and index value with same index, only with first day of month.

然后用 val .DatetimeIndex.day.htmlrel =nofollow noreferrer> DatetimeIndex.day ,最后使用 groupby resample ffill 新值。

Then divide column val by DatetimeIndex.day and last use groupby with resample and ffill new values.

df.val = df.val.div(df.index.day)

first_idx = df.index[0] - pd.offsets.MonthBegin(1)
print (first_idx)
2011-01-01 00:00:00

first_class_val = df.iloc[0]
print (first_class_val)
class    0.000000
val      0.032258
Name: 2011-01-31 00:00:00, dtype: float64

df.loc[ first_idx] = first_class_val
print (df)
            class       val
2011-01-31    0.0  0.032258
2011-02-28    0.0  0.035714
2011-03-31    0.0  0.000000
2011-04-30    0.0  0.033333
2011-05-31    0.0  0.032258
2011-01-31    1.0  0.032258
2011-02-28    1.0  0.035714
2011-03-31    1.0  0.000000
2011-04-30    1.0  0.033333
2011-05-31    1.0  0.032258
2011-01-01    0.0  0.032258





df1 = df.groupby('class').resample('D').ffill().reset_index(level=0, drop=True)

print (df1)
            class       val
2011-01-01    0.0  0.032258
2011-01-02    0.0  0.032258
2011-01-03    0.0  0.032258
2011-01-04    0.0  0.032258
2011-01-05    0.0  0.032258
2011-01-06    0.0  0.032258
2011-01-07    0.0  0.032258
2011-01-08    0.0  0.032258
2011-01-09    0.0  0.032258
2011-01-10    0.0  0.032258
2011-01-11    0.0  0.032258
2011-01-12    0.0  0.032258
2011-01-13    0.0  0.032258
2011-01-14    0.0  0.032258
2011-01-15    0.0  0.032258
...
...

这篇关于用索引+列在 pandas 中取样数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆