GroupBy - 如何使用diff()从DateTime中提取秒 [英] GroupBy - How to extract seconds from DateTime with diff()

查看:335
本文介绍了GroupBy - 如何使用diff()从DateTime中提取秒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

 在[372]中:df_2 
输出[372]:
A ID3 DATETIME
0 B-028 b76cd912ff 2014-10-08 13:43:27
1 B-054 4a57ed0b02 2014-10-08 14:26:19
2 B- 076 1a682034f8 2014-10-08 14:29:01
3 B-023 b76cd912ff 2014-10-08 18:39:34
4 B-023 f88g8d7sds 2014-10-08 18:40:18
5 B-033 b76cd912ff 2014-10-08 18:44:30
6 B-032 b76cd912ff 2014-10-08 18:46:00
7 B-037 b76cd912ff 2014-10 -08 18:52:15
8 B-046 db959faf02 2014-10-08 18:59:59
9 B-053 b76cd912ff 2014-10-08 19:17:48
10 B-065 b76cd912ff 2014-10-08 19:21:38

我想找到差异在不同的条目之间 - 按'ID3'分组。



我正在尝试使用 () GroupBy 如下:

 在[379]中:df_2 ['diff'] = df_2.sort_values(by ='DATETIME')。groupby('ID3')[ 。DATETIME]变换(拉姆达X:x.diff()); df_2 ['diff'] 
出[379]:
0 NaT
1 NaT
2 NaT
3 1970-01-01 04:56:07
4 NaT
5 1970-01-01 00:04:56
6 1970-01-01 00:01:30
7 1970-01-01 00:06:15
8 NaT
9 1970-01-01 00:25:33
10 1970-01-01 00:03:50
名称:diff,dtype:datetime64 [ns]

我也尝试过 x.diff()。astype(int) / code> for lambda ,结果完全相同。



$ $ c $的数据类型c>'DATETIME''diff'是: datetime64 [ns] p>

我想要实现的是 diff 以秒为单位,而不是与Epoch时间相关的时间。 p>

我知道我可以将 df_2 ['diff'] 转换为 TimeDelta 然后在这一点上提取一个链呼叫秒,这样的:

 在[405]中:df_2 ['diff'] = pd.to_timedelta(df_2 ['diff'])map(lambda x:x。 total_seconds()); df_2 ['diff'] 
出[407]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
名称:diff,dtype:float64
df_2 ['diff']的值
)在转换中的一步,而不是在这个过程中需要采取几个步骤?



最后,我已经尝试在变换中转换为 TimeDelta ,没有任何成功。



感谢您的帮助!

解决方案

更新: transform() from class NDFrameGroupBy(GroupBy)似乎没有做预处理和工作:

 在[220]中:(df_2 [['ID3','DATETIME']] 
.....:.sort_values(by = '约会时间)
.....:.groupby('ID3')
.....:.transform(lambda x:x.diff()。dt.total_seconds())
.....:)
出[220]:
DATETIME
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
pre>

转换() class SeriesGroupBy(GroupBy)尝试执行以下操作:

  result = _possably_downcast_to_dtype(result,dtype)

可能(我不确定)导致您的问题



答案:



尝试这样:

  168]:df_2.sort_values(by ='DATETIME')。groupby('ID3')['DATETIME']。diff()。dt.total_seconds()
输出[168]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
dtype:float64


I have the following dataframe:

In [372]: df_2
Out[372]: 
        A         ID3            DATETIME
0   B-028  b76cd912ff 2014-10-08 13:43:27
1   B-054  4a57ed0b02 2014-10-08 14:26:19
2   B-076  1a682034f8 2014-10-08 14:29:01
3   B-023  b76cd912ff 2014-10-08 18:39:34
4   B-023  f88g8d7sds 2014-10-08 18:40:18
5   B-033  b76cd912ff 2014-10-08 18:44:30
6   B-032  b76cd912ff 2014-10-08 18:46:00
7   B-037  b76cd912ff 2014-10-08 18:52:15
8   B-046  db959faf02 2014-10-08 18:59:59
9   B-053  b76cd912ff 2014-10-08 19:17:48
10  B-065  b76cd912ff 2014-10-08 19:21:38

And I want to find the difference between different entries - grouped by 'ID3'.

I am trying to use transform() on a GroupBy like this:

In [379]: df_2['diff'] = df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].transform(lambda x: x.diff()); df_2['diff']
Out[379]: 
0                    NaT
1                    NaT
2                    NaT
3    1970-01-01 04:56:07
4                    NaT
5    1970-01-01 00:04:56
6    1970-01-01 00:01:30
7    1970-01-01 00:06:15
8                    NaT
9    1970-01-01 00:25:33
10   1970-01-01 00:03:50
Name: diff, dtype: datetime64[ns]

I have also tried with x.diff().astype(int) for lambda, with the exact same result.

Datatype of both 'DATETIME' and 'diff' is: datetime64[ns]

What I am trying to achieve is have diff represented in seconds instead of some time in relation to Epoch time.

I have figured out that I can convert df_2['diff'] to TimeDelta and then extract seconds in one chained call at this point, like this:

In [405]: df_2['diff'] = pd.to_timedelta(df_2['diff']).map(lambda x: x.total_seconds()); df_2['diff']
Out[407]: 
0         NaN
1         NaN
2         NaN
3     17767.0
4         NaN
5       296.0
6        90.0
7       375.0
8         NaN
9      1533.0
10      230.0
Name: diff, dtype: float64

Is there a way to achieve this (seconds as values for df_2['diff']) in one step in the transform instead of having to take a couple of steps in the process?

Finally, I have already tried making conversion to TimeDelta in transform without any success.

Thanks for the help!

解决方案

UPDATE: transform() from class NDFrameGroupBy(GroupBy) doesn't seem to do downcasting and works as expected:

In [220]: (df_2[['ID3','DATETIME']]
   .....:      .sort_values(by='DATETIME')
   .....:      .groupby('ID3')
   .....:      .transform(lambda x: x.diff().dt.total_seconds())
   .....: )
Out[220]:
    DATETIME
0        NaN
1        NaN
2        NaN
3    17767.0
4        NaN
5      296.0
6       90.0
7      375.0
8        NaN
9     1533.0
10     230.0

the transform() from class SeriesGroupBy(GroupBy) tries to do the following:

result = _possibly_downcast_to_dtype(result, dtype)

which could (i'm not sure) cause your problem

OLD answer:

try this:

In [168]: df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].diff().dt.total_seconds()
Out[168]:
0         NaN
1         NaN
2         NaN
3     17767.0
4         NaN
5       296.0
6        90.0
7       375.0
8         NaN
9      1533.0
10      230.0
dtype: float64

这篇关于GroupBy - 如何使用diff()从DateTime中提取秒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆