计算Pandas GroupBy对象中日期的差异 [英] Calculating the difference in dates in a Pandas GroupBy object

查看:180
本文介绍了计算Pandas GroupBy对象中日期的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas DataFrame,其格式如下:

I have a Pandas DataFrame with the following format:

In [0]: df
Out[0]: 
       col1  col2       date
 0     1     1          2015-01-01
 1     1     2          2015-01-09
 2     1     3          2015-01-10
 3     2     1          2015-02-10
 4     2     2          2015-02-10
 5     2     3          2015-02-25

In [1]: df.dtypes
Out[1]:
 col1             int64
 col2             int64
 date    datetime64[ns]
 dtype: object

我们想找到col2的值,该值对应于按col1分组的日期中的最大差异(在按日期排序的组中的连续元素之间).假设没有大小为1的组.

We want to find the value for col2 corresponding to the greatest difference in date (between consecutive elements in the sorted-by-dates groups), grouped by col1. Assume there are no groups of size 1.

所需的输出

In [2]: output
Out[2]:
col1   col2
1      1         # This is because the difference between 2015-01-09 and 2015-01-01 is the greatest
2      2         # This is because the difference between 2015-02-25 and 2015-02-10 is the greatest

实际的df具有许多col1的值,我们需要对它们进行分组以进行计算.通过对以下内容应用功能是否可行?请注意,日期已经按升序排列.

The real df has many values for col1 that we need to groupby to do calculations. Is this possible by applying a function to the following? Please note, the dates are already in ascending order.

gb = df.groupby(col1)
gb.apply(right_maximum_date_difference)

推荐答案

以下内容几乎就是您的数据框(我避免复制日期):

Here's something that's almost your dataframe (I avoided copying the dates):

df = pd.DataFrame({
    'col1': [1, 1, 1, 2, 2, 2],
    'col2': [1, 2, 3, 1, 2, 3],
    'date': [1, 9, 10, 10, 10, 25]
})

以此定义:

def max_diff_date(g):
    g = g.sort(columns=['date'])
    return g.col2.ix[(g.date.ix[1: ] - g.date.shift(1)).argmax() - 1]

您有:

>> df.groupby(df.col1).apply(max_diff_date)
col1
1    1
2    2
dtype: int64

这篇关于计算Pandas GroupBy对象中日期的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆