pandas DataFrame-持续时间获取小时数 [英] pandas DataFrame - Getting hours in a Duration
问题描述
我是熊猫的相对新手,我不确定该如何处理.我正在分析通过服务台系统的票证流动.原始数据如下所示(包含更多列,有时甚至跨越几天):
I'm a relative novice to pandas, and I'm not sure how to approach this. I'm analyzing ticket flow through a Help Desk system. Raw data looks like this (with many more columns, and sometimes spanning days):
TicketNo SvcGroup CreatedAt ClosedAt
0 4237941 Unix 2013-07-28 03:55:00 2013-07-28 11:01:37.346438
1 4238041 Windows 2013-07-28 04:59:00 2013-07-28 18:25:02.193182
2 4238051 Windows 2013-07-28 05:09:00 2013-07-28 23:11:12.003673
3 4238291 Windows 2013-07-28 05:10:00 2013-07-28 05:32:51.547251
4 4238321 Unix 2013-07-28 01:15:00 2013-07-28 10:09:20
5 4238331 Unix 2013-07-28 01:53:00 2013-07-28 17:42:56.192088
6 4238561 Windows 2013-07-28 02:03:00 2013-07-28 06:34:09.455042
7 4238691 Windows 2013-07-28 02:03:00 2013-07-28 20:54:47.306731
8 4238811 Windows 2013-07-28 03:23:00 2013-07-28 13:15:20.823505
9 4238851 Windows 2013-07-28 04:16:00 2013-07-28 23:51:55.561463
10 4239011 Unix 2013-07-28 04:26:00 2013-07-28 09:27:06.275342
11 4239041 Windows 2013-07-28 04:38:00 2013-07-28 07:55:34.416621
12 4239131 Unix 2013-07-28 08:15:00 2013-07-28 08:46:42.380739
13 4239141 Windows 2013-07-28 01:08:00 2013-07-28 15:37:12.266341
我想按小时查看数据,以查看工单如何按班次流经服务台-因此,中间步骤可能是这样的:
I want to look at the data by hour, to see how tickets are flowing through the Help Desk by shift - so an intermediate step could be something like this:
Opened Open Closed CarryFwd
TicketNo SvcGroup Hour
4237941 Unix 3 1 1 0 1
4 0 1 0 1
5 0 1 0 1
6 0 1 0 1
7 0 1 0 1
8 0 1 0 1
9 0 1 0 1
10 0 1 0 1
11 0 1 1 0
4239041 Windows 4 1 1 0 1
5 0 1 0 1
6 0 1 0 1
7 0 1 1 0
最终结果如(从上面的分组中得出):
With a final result like (from grouping the above):
Opened Closed CarryFwd
SvcGroup Hour
Unix 3 6 7 47
4 7 10 44
5 1 6 39
6 11 2 48
7 7 3 52
8 5 5 52
9 5 11 46
Windows 3 6 7 22
4 3 10 15
5 5 2 18
6 6 2 22
7 11 11 22
8 2 4 20
9 0 2 18
注意:这是按小时细分的,但我可能希望按天,周等进行查看.一旦了解了以上内容,就可以判断服务组是否在不断壮大,落后等.
Note: this is broken down by hour, but I could want to look at it by day, week, etc. Once I get to the above, then I can tell whether a Service Group is gaining ground, falling behind, etc.
有关如何解决此问题的任何想法?我真正不知道的部分是如何将CreatedAt设置为ClosedAt持续时间,并按离散的时间间隔(小时等)将其分解...
Any ideas on how to approach this? The part I really can't figure out is how to take the CreatedAt to ClosedAt duration and break it down by discrete time intervals (hours, etc)...
任何指导都将不胜感激.谢谢.
Any guidance is greatly appreciated. Thanks.
推荐答案
这只是部分答案.
读入数据,注意必须合并2个日期/时间列
Read in your data, note had to combine 2 date/time columns
In [75]: df = read_csv(StringIO(data),sep='\s+',skiprows=1,parse_dates=[[3,4],[5,6]],header=None)
In [76]: df.columns = ['created','closed','idx','num','typ']
In [77]: df
Out[77]:
created closed idx num typ
0 2013-07-28 03:55:00 2013-07-28 11:01:37.346438 0 4237941 Unix
1 2013-07-28 04:59:00 2013-07-28 18:25:02.193182 1 4238041 Windows
2 2013-07-28 05:09:00 2013-07-28 23:11:12.003673 2 4238051 Windows
3 2013-07-28 05:10:00 2013-07-28 05:32:51.547251 3 4238291 Windows
4 2013-07-28 01:15:00 2013-07-28 10:09:20 4 4238321 Unix
5 2013-07-28 01:53:00 2013-07-28 17:42:56.192088 5 4238331 Unix
6 2013-07-28 02:03:00 2013-07-28 06:34:09.455042 6 4238561 Windows
7 2013-07-28 02:03:00 2013-07-28 20:54:47.306731 7 4238691 Windows
8 2013-07-28 03:23:00 2013-07-28 13:15:20.823505 8 4238811 Windows
9 2013-07-28 04:16:00 2013-07-28 23:51:55.561463 9 4238851 Windows
10 2013-07-28 04:26:00 2013-07-28 09:27:06.275342 10 4239011 Unix
11 2013-07-28 04:38:00 2013-07-28 07:55:34.416621 11 4239041 Windows
12 2013-07-28 08:15:00 2013-07-28 08:46:42.380739 12 4239131 Unix
13 2013-07-28 01:08:00 2013-07-28 15:37:12.266341 13 4239141 Windows
In [78]: df.dtypes
Out[78]:
created datetime64[ns]
closed datetime64[ns]
idx int64
num int64
typ object
dtype: object
对于每个偶数,在范围内(创建-封闭)放置1.用0填充nan.
For each even, put a 1 where it is in the range (created-closed). Fill the nan's with 0's.
In [82]: m = df.apply(lambda x: Series(1,index=np.arange(x['created'].hour,x['closed'].hour+1)),axis=1).fillna(0)
In [81]: m
Out[81]:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
6 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
8 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
将其加入原始数据集并设置索引
Join it to the original dataset and set the index
在[83]中:y = df [[''num','typ']].join(m).set_index(['num','typ'])
In [83]: y = df[['num','typ']].join(m).set_index(['num','typ'])
In [84]: y
Out[84]:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
num typ
4237941 Unix 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
4238041 Windows 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
4238051 Windows 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4238291 Windows 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4238321 Unix 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
4238331 Unix 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
4238561 Windows 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4238691 Windows 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
4238811 Windows 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
4238851 Windows 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4239011 Unix 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4239041 Windows 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4239131 Unix 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4239141 Windows 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
此时您可以进行计算
打开/关闭是简单的边缘检测.随身携带只是m.where(m==1)
Opened/Closed are straightforward edge detection. Carry Fwd is just m.where(m==1)
这篇关于pandas DataFrame-持续时间获取小时数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!