如何计算 pandas 中重新采样的多索引数据框 [英] How can I count a resampled multi-indexed dataframe in pandas
问题描述
我发现了如何对多索引重新采样的描述:
I found this description of how to resample a multi-index:
但是,一旦我使用count而不是sum,该解决方案就不再起作用
However as soon as I use count instead of sum the solution is not working any longer
这可能与以下内容有关:使用"how = count"引起问题的重新采样
This might be related to: Resampling with 'how=count' causing problems
不起作用的计数和字符串:
values_a =[1]*16
states = ['Georgia']*8 + ['Alabama']*8
#cities = ['Atlanta']*4 + ['Savanna']*4 + ['Mobile']*4 + ['Montgomery']*4
dates = pd.DatetimeIndex([datetime.datetime(2012,1,1)+datetime.timedelta(days = i) for i in range(4)]*4)
df2 = pd.DataFrame(
{'value_a': values_a},
index = [states, dates])
df2.index.names = ['State', 'Date']
df2.reset_index(level=[0], inplace=True)
print(df2.groupby(['State']).resample('W',how='count'))
收益:
2012-01-01 2012-01-08
State value_a State value_a
State
Alabama 2 2 6 6
Georgia 2 2 6 6
以 sum 和数字作为值的工作版本
values_a =[1]*16
states = ['Georgia']*8 + ['Alabama']*8
#cities = ['Atlanta']*4 + ['Savanna']*4 + ['Mobile']*4 + ['Montgomery']*4
dates = pd.DatetimeIndex([datetime.datetime(2012,1,1)+datetime.timedelta(days = i) for i in range(4)]*4)
df2 = pd.DataFrame(
{'value_a': values_a},
index = [states, dates])
df2.index.names = ['State', 'Date']
df2.reset_index(level=[0], inplace=True)
print(df2.groupby(['State']).resample('W',how='sum'))
收益率(注意不要重复州"):
Yields (notice no duplication of 'State'):
value_a
State Date
Alabama 2012-01-01 2
2012-01-08 6
Georgia 2012-01-01 2
2012-01-08 6
推荐答案
使用count
时,状态不是令人讨厌的列(它可以计算字符串),所以resample
将对其应用计数(尽管输出不是我所期望的).您可以执行类似的操作(告诉它仅将count
应用于value_a
),
When using count
, state isn't a nuisance column (it can count strings) so the resample
is going to apply count to it (although the output is not what I would expect). You could do something like (tell it only to apply count
to value_a
),
>>> print df2.groupby(['State']).resample('W',how={'value_a':'count'})
value_a
State Date
Alabama 2012-01-01 2
2012-01-08 6
Georgia 2012-01-01 2
2012-01-08 6
或更笼统地说,您可以将不同种类的how
应用于不同的列:
Or more generally, you can apply different kinds of how
to different columns:
>>> print df2.groupby(['State']).resample('W',how={'value_a':'count','State':'last'})
State value_a
State Date
Alabama 2012-01-01 Alabama 2
2012-01-08 Alabama 6
Georgia 2012-01-01 Georgia 2
2012-01-08 Georgia 6
因此,尽管以上内容使您可以count
重新采样的多索引数据帧,但并不能解释how='count'
的输出行为.以下更接近我期望的行为方式:
So while the above allows you to count
a resampled multi-index dataframe it doesn't explain the behavior of output fromhow='count'
. The following is closer to the way I would expect it to behave:
print df2.groupby(['State']).resample('W',how={'value_a':'count','State':'count'})
State value_a
State Date
Alabama 2012-01-01 2 2
2012-01-08 6 6
Georgia 2012-01-01 2 2
2012-01-08 6 6
这篇关于如何计算 pandas 中重新采样的多索引数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!