计算间隔中的寄存器数 [英] Count number of registers in interval

查看:50
本文介绍了计算间隔中的寄存器数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想我最好通过一个例子来解释我想要实现的目标.假设我有这个数据框:

I think I'd better explain what I want to achieve through an example. Supposed I have this dataframe:

     time
0     2013-01-01 12:56:00
1     2013-01-01 12:00:12
2     2013-01-01 10:34:28
3     2013-01-01 09:34:54
4     2013-01-01 08:34:55
5     2013-01-01 16:35:19
6     2013-01-01 16:35:30

我想,给定一个时间间隔 T,计算每一行,在那个时间间隔内打开"了多少个寄存器.例如,考虑到 T = 2 小时,这将是输出:

I would like to, given a interval T, count, for each line, how many registers were "opened" in that interval. For example, this would be the output, considering T = 2hours:

     time                  count
0     2013-01-01 12:56:00  1     # 12:56-2 = 10:56 -> 1 register between [10:56, 12:56)
1     2013-01-01 12:00:12  1 
2     2013-01-01 10:34:28  2     # 10:34:28-2 = 8:34:28 -> 2 registers between [8:34:28, 10:34:28) 
3     2013-01-01 09:34:54  1
4     2013-01-01 08:34:55  0
5     2013-01-01 16:35:19  0
6     2013-01-01 16:35:30  1

我想知道如何使用熊猫获得这个结果.如果我只考虑 dt.hour acessor,例如,对于 T 等于 1,我可以每小时创建一个列数,然后将其移动 1,对 count[i] + count[i 的结果求和-1].但我不知道是否可以将其推广到所需的输出.

I wonder how to obtain this result using pandas. If I was considering just the dt.hour acessor, for T equals 1, for example, I could create a column count per hour, and than shift it by 1, summing the result of count[i] + count[i-1]. But I don't know if can generalize this for the desired ouput.

推荐答案

这里的想法是将所有寄存器打开时间标记为 +1,将所有寄存器关闭时间标记为 -1.然后按时间排序并对 +/- 1 值执行累积求和,以在给定时间打开计数.

The idea here is to mark all register opening times as +1 and all register closing times as -1. Then sort by time and perform a cumulative sum over the +/- 1 values to get the count open at a given time.

# initialize interval start times as 1, end times as -1
start_times= df.assign(time=df['time'] - pd.Timedelta(hours=2), count=1)
all_times = start_times.append(df.assign(count=-1), ignore_index=True)

# sort by time and perform a cumulative sum get the count of overlaps at a given time
# (subtract 1 since you don't want to include the current value in the overlap)
all_times = all_times.sort_values(by='time')
all_times['count'] = all_times['count'].cumsum() - 1

# reassign to the original dataframe, keeping only the original times
df['count'] = all_times['count']

结果输出:

                 time  count
0 2013-01-01 12:56:00      1
1 2013-01-01 12:00:12      1
2 2013-01-01 10:34:28      2
3 2013-01-01 09:34:54      1
4 2013-01-01 08:34:55      0
5 2013-01-01 16:35:19      0
6 2013-01-01 16:35:30      1

这篇关于计算间隔中的寄存器数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆