使用 pandas 按日期范围分组 [英] Grouping by date range with pandas

查看：142 发布时间：2021/6/13 20:46:03 python pandas datetime group-by pandas-groupby

本文介绍了使用 pandas 按日期范围分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望按两列分组:user_id 和 date；但是，如果日期足够接近，我希望能够相应地考虑同一组和组的两个条目部分.日期是 m-d-y

user_id 日期 val1 1-1-17 12 1-1-17 13 1-1-17 11 1-1-17 11 1-2-17 12 1-2-17 12 1-10-17 13 2-1-17 1

分组将按 user_id 和彼此相距 +/- 3 天的日期分组.所以通过总结 val 的组看起来像:

user_id 日期总和(val)1 1-2-17 32 1-2-17 22 1-10-17 13 1-1-17 13 2-1-17 1

有人能想到这可以(有点)轻松地完成吗?我知道这有一些问题.例如，如果日期无休止地串在一起，相隔三天，该怎么办.但我使用的确切数据每人只有 2 个值..

谢谢！

解决方案

我会将其转换为 datetime 列，然后使用 pd.TimeGrouper:

dates = pd.to_datetime(df.date, format='%m-%d-%y')打印(日期)0 2017-01-011 2017-01-012 2017-01-013 2017-01-014 2017-01-025 2017-01-026 2017-01-107 2017-02-01名称:日期，数据类型:datetime64[ns]df = (df.assign(date=dates).set_index('date').groupby(['user_id', pd.TimeGrouper('3D')]).和().reset_index())打印(df)user_id 日期 val0 1 2017-01-01 31 2 2017-01-01 22 2 2017-01-10 13 3 2017-01-01 14 3 2017-01-31 1

<小时>

使用pd.Grouper的类似解决方案:

df = (df.assign(date=dates).groupby(['user_id', pd.Grouper(key='date', freq='3D')]).和().reset_index())打印(df)user_id 日期 val0 1 2017-01-01 31 2 2017-01-01 22 2 2017-01-10 13 3 2017-01-01 14 3 2017-01-31 1

更新:TimeGrouper 将在 Pandas 的未来版本中被弃用，因此 Grouper 在这种情况下将是首选(感谢提醒，Vaishali！).>

I am looking to group by two columns: user_id and date; however, if the dates are close enough, I want to be able to consider the two entries part of the same group and group accordingly. Date is m-d-y

user_id     date       val
1           1-1-17     1
2           1-1-17     1
3           1-1-17     1
1           1-1-17     1
1           1-2-17     1
2           1-2-17     1
2           1-10-17    1
3           2-1-17     1

The grouping would group by user_id and dates +/- 3 days from each other. so the group by summing val would look like:

user_id     date       sum(val)
1           1-2-17     3
2           1-2-17     2
2           1-10-17    1
3           1-1-17     1
3           2-1-17     1

Any way someone could think of that this could be done (somewhat) easily? I know there are some problematic aspects of this. for example, what to do if the dates string together endlessly with three days apart. but the exact data im using only has 2 values per person..

Thanks!

解决方案

I'd convert this to a datetime column and then use pd.TimeGrouper:

dates =  pd.to_datetime(df.date, format='%m-%d-%y')
print(dates)
0   2017-01-01
1   2017-01-01
2   2017-01-01
3   2017-01-01
4   2017-01-02
5   2017-01-02
6   2017-01-10
7   2017-02-01
Name: date, dtype: datetime64[ns]

df = (df.assign(date=dates).set_index('date')
        .groupby(['user_id', pd.TimeGrouper('3D')])
        .sum()
        .reset_index())    
print(df)
   user_id       date  val
0        1 2017-01-01    3
1        2 2017-01-01    2
2        2 2017-01-10    1
3        3 2017-01-01    1
4        3 2017-01-31    1

使用 pandas 按日期范围分组 [英] Grouping by date range with pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 pandas 按日期范围分组 [英] Grouping by date range with pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭