如何从 pandas 日期时间对象计算均值和方差? [英] How to calculate mean and variance from pandas datetime object?

查看:177
本文介绍了如何从 pandas 日期时间对象计算均值和方差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以YYYY-MM-DD格式计算python日期时间对象的摘要统计量(均值和标准差)?我想对具有不同ID的不同日期时间对象组执行此操作.

How do I calculate summary statistics (mean and standard deviation) for python datetime objects in the form YYYY-MM-DD? I want to do this for different groups of datetime obejcts which have different IDs.

数据如下:

import datetime as dt

df = pd.DataFrame({
'Date': [dt.date(2017,9,1),dt.date(2017,9,21),dt.date(2017,9,14),
    dt.date(2017,11,7),dt.date(2017,8,1),dt.date(2017,12,21),
    dt.date(2017,12,14),dt.date(2017,10,1),dt.date(2017,10,1)],
'ID': [1,2,3,3,2,1,2,3,2],
})

    Date        ID
    2017-09-01  1
    2017-11-01  2
    2017-09-01  3
    2017-11-07  3
    2017-08-01  2
    2017-11-01  1
    2017-12-01  2
    2017-10-01  3
    2017-10-01  2

我想要一个看起来像这样的结果

And I want a result that looks like:

ID_1_mean  ID_1_sd  ID_2_mean   ID_2_sd ...
YYYY-MM-DD int      YYYY-MM-DD  int ...

其中YYYY-MM-DD是第1组中日期的平均值,而int是第1组中的平均值,对所有组均重复

where YYYY-MM-DD is the mean from the dates in group 1 and int is the mean in group 1, repeated for all the groups.

推荐答案

这是一个笨拙的解决方法:

Here's a somewhat clunky workaround:

  1. 使用pd.to_datetime()datetime.date转换为pandas.Timestamp
  2. 使用.astype(int)pandas.Timestamp转换为整数
  3. 计算这些整数的均值和标准差
  4. 将均值转换为pandas.Timestamp
  5. 将标准转换为pandas.Timedelta
  1. Convert datetime.date to pandas.Timestamp with pd.to_datetime()
  2. Convert pandas.Timestamp to integer with .astype(int)
  3. Compute mean and std of these integers
  4. Convert mean to pandas.Timestamp
  5. Convert std to pandas.Timedelta

设置:

df = pd.DataFrame({
'Date': [dt.date(2017,9,1),dt.date(2017,9,21),dt.date(2017,9,14),
    dt.date(2017,11,7),dt.date(2017,8,1),dt.date(2017,12,21),
    dt.date(2017,12,14),dt.date(2017,10,1),dt.date(2017,10,1)],
'ID': [1,2,3,3,2,1,2,3,2],
})

解决方案:

df['Date_int'] = pd.to_datetime(df['Date']).astype(int)
res = df.groupby('ID').agg(['mean', 'std'])
res.columns = ['_'.join(c) for c in res.columns.values]

res['Date_mean'] = pd.to_datetime(res['Date_int_mean'])
res['Date_std'] = pd.to_timedelta(res['Date_int_std'])

res = res[['Date_mean', 'Date_std']]
res

             Date_mean                Date_std
ID                                            
1  2017-10-26 12:00:00 78 days 11:43:56.874291
2  2017-10-01 18:00:00 55 days 15:53:10.401720
3  2017-10-07 16:00:00 27 days 14:38:57.222514

这篇关于如何从 pandas 日期时间对象计算均值和方差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆