GroupBy - 如何使用diff()从DateTime中提取秒 [英] GroupBy - How to extract seconds from DateTime with diff()
问题描述
我有以下数据框:
在[372]中:df_2
输出[372]:
A ID3 DATETIME
0 B-028 b76cd912ff 2014-10-08 13:43:27
1 B-054 4a57ed0b02 2014-10-08 14:26:19
2 B- 076 1a682034f8 2014-10-08 14:29:01
3 B-023 b76cd912ff 2014-10-08 18:39:34
4 B-023 f88g8d7sds 2014-10-08 18:40:18
5 B-033 b76cd912ff 2014-10-08 18:44:30
6 B-032 b76cd912ff 2014-10-08 18:46:00
7 B-037 b76cd912ff 2014-10 -08 18:52:15
8 B-046 db959faf02 2014-10-08 18:59:59
9 B-053 b76cd912ff 2014-10-08 19:17:48
10 B-065 b76cd912ff 2014-10-08 19:21:38
我想找到差异在不同的条目之间 - 按'ID3'
分组。
我正在尝试使用 ()
在 GroupBy
如下:
在[379]中:df_2 ['diff'] = df_2.sort_values(by ='DATETIME')。groupby('ID3')[ 。DATETIME]变换(拉姆达X:x.diff()); df_2 ['diff']
出[379]:
0 NaT
1 NaT
2 NaT
3 1970-01-01 04:56:07
4 NaT
5 1970-01-01 00:04:56
6 1970-01-01 00:01:30
7 1970-01-01 00:06:15
8 NaT
9 1970-01-01 00:25:33
10 1970-01-01 00:03:50
名称:diff,dtype:datetime64 [ns]
我也尝试过 x.diff()。astype(int) / code> for
lambda
,结果完全相同。
$ $ c $的数据类型c>'DATETIME'和'diff'
是: datetime64 [ns]
p>
我想要实现的是 diff 以秒为单位,而不是与Epoch时间相关的时间。 p>
我知道我可以将 df_2 ['diff']
转换为 TimeDelta
然后在这一点上提取一个链呼叫秒,这样的:
在[405]中:df_2 ['diff'] = pd.to_timedelta(df_2 ['diff'])map(lambda x:x。 total_seconds()); df_2 ['diff']
出[407]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
名称:diff,dtype:float64
$ c有没有办法实现这个(秒作为 df_2 ['diff']的值
)在转换
中的一步,而不是在这个过程中需要采取几个步骤?
最后,我已经尝试在变换
中转换为 TimeDelta
,没有任何成功。
感谢您的帮助!
解决方案 更新: transform()
from class NDFrameGroupBy(GroupBy)
似乎没有做预处理和工作:
在[220]中:(df_2 [['ID3','DATETIME']]
.....:.sort_values(by = '约会时间)
.....:.groupby('ID3')
.....:.transform(lambda x:x.diff()。dt.total_seconds())
.....:)
出[220]:
DATETIME
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
pre>
转换()
从 class SeriesGroupBy(GroupBy)
尝试执行以下操作:
result = _possably_downcast_to_dtype(result,dtype)
可能(我不确定)导致您的问题
答案:
尝试这样:
168]:df_2.sort_values(by ='DATETIME')。groupby('ID3')['DATETIME']。diff()。dt.total_seconds()
输出[168]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
dtype:float64
I have the following dataframe:
In [372]: df_2
Out[372]:
A ID3 DATETIME
0 B-028 b76cd912ff 2014-10-08 13:43:27
1 B-054 4a57ed0b02 2014-10-08 14:26:19
2 B-076 1a682034f8 2014-10-08 14:29:01
3 B-023 b76cd912ff 2014-10-08 18:39:34
4 B-023 f88g8d7sds 2014-10-08 18:40:18
5 B-033 b76cd912ff 2014-10-08 18:44:30
6 B-032 b76cd912ff 2014-10-08 18:46:00
7 B-037 b76cd912ff 2014-10-08 18:52:15
8 B-046 db959faf02 2014-10-08 18:59:59
9 B-053 b76cd912ff 2014-10-08 19:17:48
10 B-065 b76cd912ff 2014-10-08 19:21:38
And I want to find the difference between different entries - grouped by 'ID3'
.
I am trying to use transform()
on a GroupBy
like this:
In [379]: df_2['diff'] = df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].transform(lambda x: x.diff()); df_2['diff']
Out[379]:
0 NaT
1 NaT
2 NaT
3 1970-01-01 04:56:07
4 NaT
5 1970-01-01 00:04:56
6 1970-01-01 00:01:30
7 1970-01-01 00:06:15
8 NaT
9 1970-01-01 00:25:33
10 1970-01-01 00:03:50
Name: diff, dtype: datetime64[ns]
I have also tried with x.diff().astype(int)
for lambda
, with the exact same result.
Datatype of both 'DATETIME'
and 'diff'
is: datetime64[ns]
What I am trying to achieve is have diff
represented in seconds instead of some time in relation to Epoch time.
I have figured out that I can convert df_2['diff']
to TimeDelta
and then extract seconds in one chained call at this point, like this:
In [405]: df_2['diff'] = pd.to_timedelta(df_2['diff']).map(lambda x: x.total_seconds()); df_2['diff']
Out[407]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
Name: diff, dtype: float64
Is there a way to achieve this (seconds as values for df_2['diff']
) in one step in the transform
instead of having to take a couple of steps in the process?
Finally, I have already tried making conversion to TimeDelta
in transform
without any success.
Thanks for the help!
解决方案 UPDATE: transform()
from class NDFrameGroupBy(GroupBy)
doesn't seem to do downcasting and works as expected:
In [220]: (df_2[['ID3','DATETIME']]
.....: .sort_values(by='DATETIME')
.....: .groupby('ID3')
.....: .transform(lambda x: x.diff().dt.total_seconds())
.....: )
Out[220]:
DATETIME
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
the transform()
from class SeriesGroupBy(GroupBy)
tries to do the following:
result = _possibly_downcast_to_dtype(result, dtype)
which could (i'm not sure) cause your problem
OLD answer:
try this:
In [168]: df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].diff().dt.total_seconds()
Out[168]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
dtype: float64
这篇关于GroupBy - 如何使用diff()从DateTime中提取秒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!