pandas DataFrame-持续时间获取小时数 [英] pandas DataFrame - Getting hours in a Duration

查看:160
本文介绍了pandas DataFrame-持续时间获取小时数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是熊猫的相对新手,我不确定该如何处理.我正在分析通过服务台系统的票证流动.原始数据如下所示(包含更多列,有时甚至跨越几天):

I'm a relative novice to pandas, and I'm not sure how to approach this. I'm analyzing ticket flow through a Help Desk system. Raw data looks like this (with many more columns, and sometimes spanning days):

    TicketNo SvcGroup           CreatedAt                   ClosedAt
0    4237941     Unix 2013-07-28 03:55:00 2013-07-28 11:01:37.346438
1    4238041  Windows 2013-07-28 04:59:00 2013-07-28 18:25:02.193182
2    4238051  Windows 2013-07-28 05:09:00 2013-07-28 23:11:12.003673
3    4238291  Windows 2013-07-28 05:10:00 2013-07-28 05:32:51.547251
4    4238321     Unix 2013-07-28 01:15:00        2013-07-28 10:09:20
5    4238331     Unix 2013-07-28 01:53:00 2013-07-28 17:42:56.192088
6    4238561  Windows 2013-07-28 02:03:00 2013-07-28 06:34:09.455042
7    4238691  Windows 2013-07-28 02:03:00 2013-07-28 20:54:47.306731
8    4238811  Windows 2013-07-28 03:23:00 2013-07-28 13:15:20.823505
9    4238851  Windows 2013-07-28 04:16:00 2013-07-28 23:51:55.561463
10   4239011     Unix 2013-07-28 04:26:00 2013-07-28 09:27:06.275342
11   4239041  Windows 2013-07-28 04:38:00 2013-07-28 07:55:34.416621
12   4239131     Unix 2013-07-28 08:15:00 2013-07-28 08:46:42.380739
13   4239141  Windows 2013-07-28 01:08:00 2013-07-28 15:37:12.266341

我想按小时查看数据,以查看工单如何按班次流经服务台-因此,中间步骤可能是这样的:

I want to look at the data by hour, to see how tickets are flowing through the Help Desk by shift - so an intermediate step could be something like this:

                        Opened  Open  Closed  CarryFwd
TicketNo SvcGroup Hour
4237941  Unix     3          1     1       0         1
                  4          0     1       0         1
                  5          0     1       0         1
                  6          0     1       0         1
                  7          0     1       0         1
                  8          0     1       0         1
                  9          0     1       0         1
                  10         0     1       0         1
                  11         0     1       1         0
4239041  Windows  4          1     1       0         1
                  5          0     1       0         1
                  6          0     1       0         1
                  7          0     1       1         0

最终结果如(从上面的分组中得出):

With a final result like (from grouping the above):

               Opened  Closed  CarryFwd
SvcGroup Hour
Unix     3          6       7        47
         4          7      10        44
         5          1       6        39
         6         11       2        48
         7          7       3        52
         8          5       5        52
         9          5      11        46
Windows  3          6       7        22
         4          3      10        15
         5          5       2        18
         6          6       2        22
         7         11      11        22
         8          2       4        20
         9          0       2        18   

注意:这是按小时细分的,但我可能希望按天,周等进行查看.一旦了解了以上内容,就可以判断服务组是否在不断壮大,落后等.

Note: this is broken down by hour, but I could want to look at it by day, week, etc. Once I get to the above, then I can tell whether a Service Group is gaining ground, falling behind, etc.

有关如何解决此问题的任何想法?我真正不知道的部分是如何将CreatedAt设置为ClosedAt持续时间,并按离散的时间间隔(小时等)将其分解...

Any ideas on how to approach this? The part I really can't figure out is how to take the CreatedAt to ClosedAt duration and break it down by discrete time intervals (hours, etc)...

任何指导都将不胜感激.谢谢.

Any guidance is greatly appreciated. Thanks.

推荐答案

这只是部分答案.

读入数据,注意必须合并2个日期/时间列

Read in your data, note had to combine 2 date/time columns

In [75]: df = read_csv(StringIO(data),sep='\s+',skiprows=1,parse_dates=[[3,4],[5,6]],header=None)

In [76]: df.columns = ['created','closed','idx','num','typ']

In [77]: df
Out[77]: 
               created                     closed  idx      num      typ
0  2013-07-28 03:55:00 2013-07-28 11:01:37.346438    0  4237941     Unix
1  2013-07-28 04:59:00 2013-07-28 18:25:02.193182    1  4238041  Windows
2  2013-07-28 05:09:00 2013-07-28 23:11:12.003673    2  4238051  Windows
3  2013-07-28 05:10:00 2013-07-28 05:32:51.547251    3  4238291  Windows
4  2013-07-28 01:15:00        2013-07-28 10:09:20    4  4238321     Unix
5  2013-07-28 01:53:00 2013-07-28 17:42:56.192088    5  4238331     Unix
6  2013-07-28 02:03:00 2013-07-28 06:34:09.455042    6  4238561  Windows
7  2013-07-28 02:03:00 2013-07-28 20:54:47.306731    7  4238691  Windows
8  2013-07-28 03:23:00 2013-07-28 13:15:20.823505    8  4238811  Windows
9  2013-07-28 04:16:00 2013-07-28 23:51:55.561463    9  4238851  Windows
10 2013-07-28 04:26:00 2013-07-28 09:27:06.275342   10  4239011     Unix
11 2013-07-28 04:38:00 2013-07-28 07:55:34.416621   11  4239041  Windows
12 2013-07-28 08:15:00 2013-07-28 08:46:42.380739   12  4239131     Unix
13 2013-07-28 01:08:00 2013-07-28 15:37:12.266341   13  4239141  Windows

In [78]: df.dtypes
Out[78]: 
created    datetime64[ns]
closed     datetime64[ns]
idx                 int64
num                 int64
typ                object
dtype: object

对于每个偶数,在范围内(创建-封闭)放置1.用0填充nan.

For each even, put a 1 where it is in the range (created-closed). Fill the nan's with 0's.

In [82]: m = df.apply(lambda x: Series(1,index=np.arange(x['created'].hour,x['closed'].hour+1)),axis=1).fillna(0)

In [81]: m
Out[81]: 
    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
0    0   0   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0
1    0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0
2    0   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
3    0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4    1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0
5    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0
6    0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
7    0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
8    0   0   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0
9    0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
10   0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
12   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
13   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0

将其加入原始数据集并设置索引

Join it to the original dataset and set the index

在[83]中:y = df [[''num','typ']].join(m).set_index(['num','typ'])

In [83]: y = df[['num','typ']].join(m).set_index(['num','typ'])

In [84]: y
Out[84]: 
                 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
num     typ                                                                                                
4237941 Unix      0   0   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0
4238041 Windows   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0
4238051 Windows   0   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
4238291 Windows   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4238321 Unix      1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0
4238331 Unix      1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0
4238561 Windows   0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4238691 Windows   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
4238811 Windows   0   0   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0
4238851 Windows   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
4239011 Unix      0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239041 Windows   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239131 Unix      0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239141 Windows   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0

此时您可以进行计算

打开/关闭是简单的边缘检测.随身携带只是m.where(m==1)

Opened/Closed are straightforward edge detection. Carry Fwd is just m.where(m==1)

这篇关于pandas DataFrame-持续时间获取小时数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆