重新采样"how = count"会导致问题 [英] Resampling with 'how=count' causing problems

查看:84
本文介绍了重新采样"how = count"会导致问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的pandas数据框,该数据框在不同时间进行测量:

I have a simple pandas dataframe that has measurements at various times:

                     volume
t
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

我正在做一些基本的重采样和绘图,例如每天的总体积,效果很好:

I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:

df.resample('D',how='sum').head()   

            volume
t
2013-10-13     184
2013-10-14     209
2013-10-15     197
2013-10-16     309
2013-10-17     317

但是由于某些原因,当我尝试执行每天的条目总数时,它将返回一个多索引序列而不是一个数据帧:

But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:

df.resample('D',how='count').head()

2013-10-13  volume     9
2013-10-14  volume     9
2013-10-15  volume     7
2013-10-16  volume     9
2013-10-17  volume    10

我可以修复数据,以便通过简单的unstack调用(即df.resample('D',how='count').unstack())轻松地绘制数据,但是为什么使用how='count'调用重采样与使用how='sum'调用行为却有所不同?

I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack(), but why does calling resample with how='count' have a different behavior than with how='sum'?

推荐答案

在结果数据帧的结构方面,确实出现了resamplecount导致某些奇怪的行为(嗯,至少达到0.13 .1).请参阅此处以获取稍有不同但相关的上下文:用多索引进行计数和重采样

It does appear the resample and count leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex

您可以使用相同的策略 在这里:

You can use the same strategy here:

>>> df
                     volume
date                       
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

这是您的问题:

>>> df.resample('D',how='count')

2013-10-13  volume    9
2013-10-14  volume    4

您可以通过指定count应用于volume列并使用resample调用中的格来解决此问题:

You can fix the issue by specifying that count applies to the volume column with a dict in the resample call:

>>> df.resample('D',how={'volume':'count'})

            volume
date              
2013-10-13       9
2013-10-14       4

这篇关于重新采样"how = count"会导致问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆