在 pandas 中,从DatetimeIndex按日期分组 [英] In pandas, group by date from DatetimeIndex

查看:85
本文介绍了在 pandas 中,从DatetimeIndex按日期分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下综合示例:

import pandas as pd
import numpy as np
np.random.seed(42)
ix = pd.date_range('2017-01-01', '2017-01-15', freq='1H')
df = pd.DataFrame(
    {
        'val': np.random.random(size=ix.shape[0]),
        'cat': np.random.choice(['foo', 'bar'], size=ix.shape[0])
    },
    index=ix
)

这将产生以下形式的表:

Which yields a table of the following form:

                    cat val
2017-01-01 00:00:00 bar 0.374540
2017-01-01 01:00:00 foo 0.950714
2017-01-01 02:00:00 bar 0.731994
2017-01-01 03:00:00 bar 0.598658
2017-01-01 04:00:00 bar 0.156019

现在,我想计算每个类别和日期的实例数量和平均值.

Now, I want to count the number and the average value of instances per each category and date.

以下groupby几乎是完美的:

df.groupby(['cat',df.index.date]).agg({'val': ['count', 'mean']})

返回:

                val
                count   mean
cat         
bar 2017-01-01  16  0.437941
    2017-01-02  16  0.456361
    2017-01-03  9   0.514388...

这个问题是,索引的第二级变成了字符串而不是date. 第一个问题:为什么会发生?我该如何避免呢?

The problem with this one, is that the second level of the index turned into strings and not date. First question: Why is it happening? How can I avoid it?

接下来,我尝试了groupbyresample的组合:

Next, I tried a combination of groupby and resample:

df.groupby('cat').resample('1d').agg({'val': 'mean'})

在这里,索引是正确的,但是我无法同时运行meancount聚合.这是第二个问题:为什么

Here, the index is correct, but I fail to run both mean and count aggregations. This is the second question: why does

df.groupby('cat').resample('1d').agg({'val': ['mean', 'count']})

不行吗?

最后一个问题是一种干净的方法(使用这两个函数)获取具有date类型索引的汇总视图()?

Last question what is the clean way to get an aggregated (using both functions) view and with date type for the index?

推荐答案

第一个问题需要立即转换为datetime s 喜欢:

For first question need convert to datetimes with no times like:

df1 = df.groupby(['cat',df.index.floor('d')]).agg({'val': ['count', 'mean']})
#df1 = df.groupby(['cat',df.index.normalize()]).agg({'val': ['count', 'mean']})

#df1 = df.groupby(['cat',pd.to_datetime(df.index.date)]).agg({'val'‌​: ['count', 'mean']})

print (df1.index.get_level_values(1))


DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
               '2017-01-13', '2017-01-14', '2017-01-01', '2017-01-02',
               '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06',
               '2017-01-07', '2017-01-08', '2017-01-09', '2017-01-10',
               '2017-01-11', '2017-01-12', '2017-01-13', '2017-01-14',
               '2017-01-15'],
              dtype='datetime64[ns]', freq=None)

...因为date是python对象:

... because dates are python objects:

df1 = df.groupby(['cat',df.index.date]).agg({'val': ['count', 'mean']})
print (type(df1.index.get_level_values(1)[0]))
<class 'datetime.date'>

第二个问题-我认为这是bug或尚未实现,因为仅在agg中使用一个函数名称:

Second question - in my opinion it is bug or not implemented yet, because working one function name in agg only:

df2 = df.groupby('cat').resample('1d')['val'].agg('mean')
#df2 = df.groupby('cat').resample('1d')['val'].mean()
print (df2)
cat            
bar  2017-01-01    0.437941
     2017-01-02    0.456361
     2017-01-03    0.514388
     2017-01-04    0.580295
     2017-01-05    0.426841
     2017-01-06    0.642465
     2017-01-07    0.395970
     2017-01-08    0.359940
...
... 

但可以正常运行apply的 :

df2 = df.groupby('cat').apply(lambda x: x.resample('1d')['val'].agg(['mean','count']))
print (df2)
                    mean  count
cat                            
bar 2017-01-01  0.437941     16
    2017-01-02  0.456361     16
    2017-01-03  0.514388      9
    2017-01-04  0.580295     12
    2017-01-05  0.426841     12
    2017-01-06  0.642465      7
    2017-01-07  0.395970     11
    2017-01-08  0.359940      9
    2017-01-09  0.564851     12
    ...
    ...

这篇关于在 pandas 中,从DatetimeIndex按日期分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆