用索引+列在 pandas 中取样数据框 [英] Upsampling Dataframe in Pandas with Index + Column
问题描述
鉴于按月索引的数据框,我希望按日索引(上采样)。以前按月编制的值现在应该除以当月的天数。除了索引之外,还应该在分组中使用一列。与此相似,只是在分组中也使用了一列。
Given a dataframe indexed by month, I'd like to reindex by day (upsample). Values that were previously indexed by month should now be divided by the number of days in the month. In addition to the index, a column should be used in the grouping. Similar to this - just with a column also being used in the grouping.
import pandas as pd
import numpy as np
np.random.seed(1234)
tidx_m = pd.date_range('2011-01-31', periods=5, freq='M')
df = pd.DataFrame(np.random.randint(0, 2, (5, 2)), columns=['class', 'val'])
df.index = tidx_m
df = pd.concat([df, df])
df.ix[:5, 'class'] = 0
df.ix[5:, 'class'] = 1
print(df)
class val
2011-01-31 0 1
2011-02-28 0 1
2011-03-31 0 0
2011-04-30 0 1
2011-05-31 0 1
2011-01-31 1 1
2011-02-28 1 1
2011-03-31 1 0
2011-04-30 1 1
2011-05-31 1 1
将索引升级到天而不是几个月后,我想按日期时间指数和 class
。 val中的值应在本月的所有日期重新分配(例如,1月份的每一天为1/31)。
After upsampling the index to days instead of months, I'd like to group by the Datetime index and class
. Values in "val" should be redistributed throughout all days in the month (e.g. 1 becomes 1 / 31 for each day in January).
推荐答案
首先必须向 DataFrame
添加新行,其值与第一行和索引值相同,仅与第一个月的日期相匹配。
First is necessary add new row to DataFrame
with values of first row and index value with same index, only with first day of month.
然后用 val .DatetimeIndex.day.htmlrel =nofollow noreferrer> DatetimeIndex.day
,最后使用 groupby
与 resample
和 ffill
新值。
Then divide column val
by DatetimeIndex.day
and last use groupby
with resample
and ffill
new values.
df.val = df.val.div(df.index.day)
first_idx = df.index[0] - pd.offsets.MonthBegin(1)
print (first_idx)
2011-01-01 00:00:00
first_class_val = df.iloc[0]
print (first_class_val)
class 0.000000
val 0.032258
Name: 2011-01-31 00:00:00, dtype: float64
df.loc[ first_idx] = first_class_val
print (df)
class val
2011-01-31 0.0 0.032258
2011-02-28 0.0 0.035714
2011-03-31 0.0 0.000000
2011-04-30 0.0 0.033333
2011-05-31 0.0 0.032258
2011-01-31 1.0 0.032258
2011-02-28 1.0 0.035714
2011-03-31 1.0 0.000000
2011-04-30 1.0 0.033333
2011-05-31 1.0 0.032258
2011-01-01 0.0 0.032258
df1 = df.groupby('class').resample('D').ffill().reset_index(level=0, drop=True)
print (df1)
class val
2011-01-01 0.0 0.032258
2011-01-02 0.0 0.032258
2011-01-03 0.0 0.032258
2011-01-04 0.0 0.032258
2011-01-05 0.0 0.032258
2011-01-06 0.0 0.032258
2011-01-07 0.0 0.032258
2011-01-08 0.0 0.032258
2011-01-09 0.0 0.032258
2011-01-10 0.0 0.032258
2011-01-11 0.0 0.032258
2011-01-12 0.0 0.032258
2011-01-13 0.0 0.032258
2011-01-14 0.0 0.032258
2011-01-15 0.0 0.032258
...
...
这篇关于用索引+列在 pandas 中取样数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!