使用 pandas 按日期范围分组 [英] Grouping by date range with pandas

查看:142
本文介绍了使用 pandas 按日期范围分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望按两列分组:user_id 和 date;但是,如果日期足够接近,我希望能够相应地考虑同一组和组的两个条目部分.日期是 m-d-y

user_id 日期 val1 1-1-17 12 1-1-17 13 1-1-17 11 1-1-17 11 1-2-17 12 1-2-17 12 1-10-17 13 2-1-17 1

分组将按 user_id 和彼此相距 +/- 3 天的日期分组.所以通过总结 val 的组看起来像:

user_id 日期总和(val)1 1-2-17 32 1-2-17 22 1-10-17 13 1-1-17 13 2-1-17 1

有人能想到这可以(有点)轻松地完成吗?我知道这有一些问题.例如,如果日期无休止地串在一起,相隔三天,该怎么办.但我使用的确切数据每人只有 2 个值..

谢谢!

解决方案

我会将其转换为 datetime 列,然后使用 pd.TimeGrouper:

dates = pd.to_datetime(df.date, format='%m-%d-%y')打印(日期)0 2017-01-011 2017-01-012 2017-01-013 2017-01-014 2017-01-025 2017-01-026 2017-01-107 2017-02-01名称:日期,数据类型:datetime64[ns]df = (df.assign(date=dates).set_index('date').groupby(['user_id', pd.TimeGrouper('3D')]).和().reset_index())打印(df)user_id 日期 val0 1 2017-01-01 31 2 2017-01-01 22 2 2017-01-10 13 3 2017-01-01 14 3 2017-01-31 1

<小时>

使用pd.Grouper的类似解决方案:

df = (df.assign(date=dates).groupby(['user_id', pd.Grouper(key='date', freq='3D')]).和().reset_index())打印(df)user_id 日期 val0 1 2017-01-01 31 2 2017-01-01 22 2 2017-01-10 13 3 2017-01-01 14 3 2017-01-31 1

更新:TimeGrouper 将在 Pandas 的未来版本中被弃用,因此 Grouper 在这种情况下将是首选(感谢提醒,Vaishali!).>

I am looking to group by two columns: user_id and date; however, if the dates are close enough, I want to be able to consider the two entries part of the same group and group accordingly. Date is m-d-y

user_id     date       val
1           1-1-17     1
2           1-1-17     1
3           1-1-17     1
1           1-1-17     1
1           1-2-17     1
2           1-2-17     1
2           1-10-17    1
3           2-1-17     1

The grouping would group by user_id and dates +/- 3 days from each other. so the group by summing val would look like:

user_id     date       sum(val)
1           1-2-17     3
2           1-2-17     2
2           1-10-17    1
3           1-1-17     1
3           2-1-17     1

Any way someone could think of that this could be done (somewhat) easily? I know there are some problematic aspects of this. for example, what to do if the dates string together endlessly with three days apart. but the exact data im using only has 2 values per person..

Thanks!

解决方案

I'd convert this to a datetime column and then use pd.TimeGrouper:

dates =  pd.to_datetime(df.date, format='%m-%d-%y')
print(dates)
0   2017-01-01
1   2017-01-01
2   2017-01-01
3   2017-01-01
4   2017-01-02
5   2017-01-02
6   2017-01-10
7   2017-02-01
Name: date, dtype: datetime64[ns]

df = (df.assign(date=dates).set_index('date')
        .groupby(['user_id', pd.TimeGrouper('3D')])
        .sum()
        .reset_index())    
print(df)
   user_id       date  val
0        1 2017-01-01    3
1        2 2017-01-01    2
2        2 2017-01-10    1
3        3 2017-01-01    1
4        3 2017-01-31    1


Similar solution using pd.Grouper:

df = (df.assign(date=dates)
        .groupby(['user_id', pd.Grouper(key='date', freq='3D')])
        .sum()
        .reset_index())
print(df)
   user_id       date  val
0        1 2017-01-01    3
1        2 2017-01-01    2
2        2 2017-01-10    1
3        3 2017-01-01    1
4        3 2017-01-31    1

Update: TimeGrouper will be deprecated in future versions of pandas, so Grouper would be preferred in this scenario (thanks for the heads up, Vaishali!).

这篇关于使用 pandas 按日期范围分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆