计算间隔中的寄存器数 [英] Count number of registers in interval
问题描述
我想我最好通过一个例子来解释我想要实现的目标.假设我有这个数据框:
I think I'd better explain what I want to achieve through an example. Supposed I have this dataframe:
time
0 2013-01-01 12:56:00
1 2013-01-01 12:00:12
2 2013-01-01 10:34:28
3 2013-01-01 09:34:54
4 2013-01-01 08:34:55
5 2013-01-01 16:35:19
6 2013-01-01 16:35:30
我想,给定一个时间间隔 T,计算每一行,在那个时间间隔内打开"了多少个寄存器.例如,考虑到 T = 2 小时,这将是输出:
I would like to, given a interval T, count, for each line, how many registers were "opened" in that interval. For example, this would be the output, considering T = 2hours:
time count
0 2013-01-01 12:56:00 1 # 12:56-2 = 10:56 -> 1 register between [10:56, 12:56)
1 2013-01-01 12:00:12 1
2 2013-01-01 10:34:28 2 # 10:34:28-2 = 8:34:28 -> 2 registers between [8:34:28, 10:34:28)
3 2013-01-01 09:34:54 1
4 2013-01-01 08:34:55 0
5 2013-01-01 16:35:19 0
6 2013-01-01 16:35:30 1
我想知道如何使用熊猫获得这个结果.如果我只考虑 dt.hour acessor,例如,对于 T 等于 1,我可以每小时创建一个列数,然后将其移动 1,对 count[i] + count[i 的结果求和-1]
.但我不知道是否可以将其推广到所需的输出.
I wonder how to obtain this result using pandas. If I was considering just the dt.hour acessor, for T equals 1, for example, I could create a column count per hour, and than shift it by 1, summing the result of count[i] + count[i-1]
. But I don't know if can generalize this for the desired ouput.
推荐答案
这里的想法是将所有寄存器打开时间标记为 +1,将所有寄存器关闭时间标记为 -1.然后按时间排序并对 +/- 1 值执行累积求和,以在给定时间打开计数.
The idea here is to mark all register opening times as +1 and all register closing times as -1. Then sort by time and perform a cumulative sum over the +/- 1 values to get the count open at a given time.
# initialize interval start times as 1, end times as -1
start_times= df.assign(time=df['time'] - pd.Timedelta(hours=2), count=1)
all_times = start_times.append(df.assign(count=-1), ignore_index=True)
# sort by time and perform a cumulative sum get the count of overlaps at a given time
# (subtract 1 since you don't want to include the current value in the overlap)
all_times = all_times.sort_values(by='time')
all_times['count'] = all_times['count'].cumsum() - 1
# reassign to the original dataframe, keeping only the original times
df['count'] = all_times['count']
结果输出:
time count
0 2013-01-01 12:56:00 1
1 2013-01-01 12:00:12 1
2 2013-01-01 10:34:28 2
3 2013-01-01 09:34:54 1
4 2013-01-01 08:34:55 0
5 2013-01-01 16:35:19 0
6 2013-01-01 16:35:30 1
这篇关于计算间隔中的寄存器数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!