使用Groupby对行进行分组并转换日期&开始日期时间和结束日期时间列的行时间 [英] Grouping rows with Groupby and converting date & time of rows of start date-time and end date- time columns

查看:77
本文介绍了使用Groupby对行进行分组并转换日期&开始日期时间和结束日期时间列的行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据集:

 Blast Hole	East Coordinate	North Coordinate	Collar	Theoritical Depth	Tag Detector ID	Date and Time	Detection_Location	Detection Date & Time
64	16745.42	107390.32	2634.45	15.95	385656531	23-08-2018 2:39:34 PM	CV23	2018-09-08 14:18:17
61	16773.48	107382.6	2634.68	16.18	385760755	23-08-2018 2:38:32 PM	CV23	2018-09-08 14:24:19
63	16755.07	107387.68	2634.58	16.08	385262370	23-08-2018 2:39:30 PM	CV23	2018-09-08 14:12:42
105	16764.83	107347.67	2634.74	16.24	385742468	23-08-2018 2:41:29 PM	CV22	2018-09-06 20:02:46
100	16752.74	107360.32	2634.33	15.83	385112050	23-08-2018 2:41:08 PM	CV22	2018-09-06 20:15:42
99	16743.1	107362.96	2634.36	15.86	385087366	23-08-2018 2:41:05 PM	CV22	2018-09-06 20:49:21
35	16747.75	107417.68	2635.9	17.4	385453358	23-08-2018 2:36:09 PM	CV22	2018-09-23 05:47:44
5	16757.27	107452.4	2636	17.5	385662254	23-08-2018 2:35:03 PM	CV22	2018-09-23 05:01:12
19	16770.89	107420.83	2634.81	16.31	385826979	23-08-2018 2:35:50 PM	CV22	2018-09-23 05:52:54 

我打算在一小时内在一个位置(在Detection_location列中)将具有3个检测结果的所有行分组. 我使用以下代码对每3次检测在一小时内落入的行进行分组:

df2 = df1.groupby([pd.Grouper(key = 'Detection Date & Time', freq = 'H'), 
      df1.Detection_Location]).size().reset_index(name = 'Tags')

这段代码给了我这样的结果:

我希望得到这样的结果,其中每行都有开始时间,这是在那个小时中第一次检测到那里并且看到最后一次检测时所见的,因此,我希望得到这样的结果:

 This is the required output:

Detection Date & Time - Start 	Detection Date & Time - End	Detection_Location	Tags
2018-09-06 20:02:46	2018-09-06 20:49:21	CV22	3
2018-09-08 14:12:42	2018-09-08 14:24:19	CV23	3
2018-09-23 05:01:12	2018-09-23 05:47:44	CV22	3 

任何人都可以建议我在分组功能中添加哪些内容,以便获得此结果.

谢谢

解决方案

检查是否适合您.在聚合函数内部,您可以传递要捕获的所有值.

df2 = (df.groupby([pd.Grouper(key = 'Detection Date & Time', freq = 'H'),df.Detection_Location],sort=False)['Detection Date & Time']
   .agg(['first','last','size'])).reset_index().rename(columns={"first": "Detection Date & Time - Start", "last": "Detection Date & Time - End", "size": "Tags"})

I have a dataset looking like this:

Blast Hole	East Coordinate	North Coordinate	Collar	Theoritical Depth	Tag Detector ID	Date and Time	Detection_Location	Detection Date & Time
64	16745.42	107390.32	2634.45	15.95	385656531	23-08-2018 2:39:34 PM	CV23	2018-09-08 14:18:17
61	16773.48	107382.6	2634.68	16.18	385760755	23-08-2018 2:38:32 PM	CV23	2018-09-08 14:24:19
63	16755.07	107387.68	2634.58	16.08	385262370	23-08-2018 2:39:30 PM	CV23	2018-09-08 14:12:42
105	16764.83	107347.67	2634.74	16.24	385742468	23-08-2018 2:41:29 PM	CV22	2018-09-06 20:02:46
100	16752.74	107360.32	2634.33	15.83	385112050	23-08-2018 2:41:08 PM	CV22	2018-09-06 20:15:42
99	16743.1	107362.96	2634.36	15.86	385087366	23-08-2018 2:41:05 PM	CV22	2018-09-06 20:49:21
35	16747.75	107417.68	2635.9	17.4	385453358	23-08-2018 2:36:09 PM	CV22	2018-09-23 05:47:44
5	16757.27	107452.4	2636	17.5	385662254	23-08-2018 2:35:03 PM	CV22	2018-09-23 05:01:12
19	16770.89	107420.83	2634.81	16.31	385826979	23-08-2018 2:35:50 PM	CV22	2018-09-23 05:52:54

I intended to group all the rows having 3 detections at one place ( in column Detection_location) in one hour. I used the following code for grouping the rows falling in one hour per 3 detection:

df2 = df1.groupby([pd.Grouper(key = 'Detection Date & Time', freq = 'H'), 
      df1.Detection_Location]).size().reset_index(name = 'Tags')

This code gave me a result like this:

I would rather like to have result in which each rows have start time when the first detection was there in that hour and when the last detection was seen and thus i would like to have a result like this:

This is the required output:

Detection Date & Time - Start 	Detection Date & Time - End	Detection_Location	Tags
2018-09-06 20:02:46	2018-09-06 20:49:21	CV22	3
2018-09-08 14:12:42	2018-09-08 14:24:19	CV23	3
2018-09-23 05:01:12	2018-09-23 05:47:44	CV22	3

Can anyone suggest what else should i add in my group-by function to get this result.

Thanks

解决方案

Check if this works for you. Inside the aggregate function, you can pass all the values that you want to capture.

df2 = (df.groupby([pd.Grouper(key = 'Detection Date & Time', freq = 'H'),df.Detection_Location],sort=False)['Detection Date & Time']
   .agg(['first','last','size'])).reset_index().rename(columns={"first": "Detection Date & Time - Start", "last": "Detection Date & Time - End", "size": "Tags"})

这篇关于使用Groupby对行进行分组并转换日期&开始日期时间和结束日期时间列的行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆