重新采样"how = count"会导致问题 [英] Resampling with 'how=count' causing problems
问题描述
我有一个简单的pandas数据框,该数据框在不同时间进行测量:
I have a simple pandas dataframe that has measurements at various times:
volume
t
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
我正在做一些基本的重采样和绘图,例如每天的总体积,效果很好:
I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:
df.resample('D',how='sum').head()
volume
t
2013-10-13 184
2013-10-14 209
2013-10-15 197
2013-10-16 309
2013-10-17 317
但是由于某些原因,当我尝试执行每天的条目总数时,它将返回一个多索引序列而不是一个数据帧:
But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:
df.resample('D',how='count').head()
2013-10-13 volume 9
2013-10-14 volume 9
2013-10-15 volume 7
2013-10-16 volume 9
2013-10-17 volume 10
我可以修复数据,以便通过简单的unstack调用(即df.resample('D',how='count').unstack()
)轻松地绘制数据,但是为什么使用how='count'
调用重采样与使用how='sum'
调用行为却有所不同?
I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack()
, but why does calling resample with how='count'
have a different behavior than with how='sum'
?
推荐答案
在结果数据帧的结构方面,确实出现了resample
和count
导致某些奇怪的行为(嗯,至少达到0.13 .1).请参阅此处以获取稍有不同但相关的上下文:用多索引进行计数和重采样
It does appear the resample
and count
leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex
您可以使用相同的策略 在这里:
You can use the same strategy here:
>>> df
volume
date
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
这是您的问题:
>>> df.resample('D',how='count')
2013-10-13 volume 9
2013-10-14 volume 4
您可以通过指定count
应用于volume
列并使用resample
调用中的格来解决此问题:
You can fix the issue by specifying that count
applies to the volume
column with a dict in the resample
call:
>>> df.resample('D',how={'volume':'count'})
volume
date
2013-10-13 9
2013-10-14 4
这篇关于重新采样"how = count"会导致问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!