Python中的时间范围重叠算法 [英] Timeranges overlapping algorithm in Python

查看:175
本文介绍了Python中的时间范围重叠算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不同ID的列表,开始日期和结束日期,例如

I have a list of different ids, start dates and end dates, let's say :

[
(5, d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 0, 5, 10)),
(6, d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 12, 59, 59)),
(4, d.datetime(2010, 9, 19, 10, 30, 17), d.datetime(2010, 9, 19, 20, 20, 59)),
(6, d.datetime(2010, 9, 19, 14, 12, 0),  d.datetime(2010, 9, 19, 23, 59, 59)),
(5, d.datetime(2010, 9, 19, 17, 0, 22),  d.datetime(2010, 9, 19, 19, 14, 20))
]

我需要某种方式找到重叠的时间范围,准备在特定时间范围内包含正确ID的新列表,例如,上面的列表结果应为:

I need somehow to find overlapping timerange and prepare new list with properly ids which were under coverage at the specific timerange, for example for list above result should be :

[
('5,6',   d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 0, 5, 10),
('6',     d.datetime(2010, 9, 19, 0, 5, 10),   d.datetime(2010, 9, 19, 10, 30, 17),
('4,6',   d.datetime(2010, 9, 19, 10, 30, 17), d.datetime(2010, 9, 19, 12, 59, 59),
('4',     d.datetime(2010, 9, 19, 12, 59, 59), d.datetime(2010, 9, 19, 14, 12, 0),
('4,6',   d.datetime(2010, 9, 19, 14, 12, 0),  d.datetime(2010, 9, 19, 17, 0, 22),
('4,5,6', d.datetime(2010, 9, 19, 17, 0, 22),  d.datetime(2010, 9, 19, 19, 14, 20),
('4,6',   d.datetime(2010, 9, 19, 19, 14, 20), d.datetime(2010, 9, 19, 20, 20, 59),
('6',     d.datetime(2010, 9, 19, 20, 20, 59), d.datetime(2010, 9, 19, 23, 59, 59)
]

视觉概念:

实际上,我现在已经有了这样的解决方案:我正在获取整个范围的最小和最大日期,然后从min_date开始迭代到max_date每1秒,当特别是第二秒我们匹配目标列表中的一些间隔时,我将匹配的ID保存为dictionar y键,并将从迭代器到列表的时间附加为值,然后将其保存到父列表中,然后保存下一个和下一个。最后,我遍历父列表中的所有字典,并获取ID作为键,并将值列表中的第一个,最后一个日期作为我需要查找的范围。
但是,当我计算月份范围时,此解决方案的运行速度非常慢。因为要花1个月的时间才能迭代太多时间。

Actually for now I've solution like this: I'm getting minimal and maximum dates of the whole range, then start iterate from min_date to max_date each 1 second, when in particular second we match some of intervals from target list, I save matched ids as dictionary key and append time from iterator to list as value, then save it to parent list, then next and next. At final I go over all dicts in parent list and get ids as keys and first, last date in value list as range that I need to find. But this solution works very slow when I count ranges in month. Because it's takes too much time iterate 1 month in seconds.

这里是代码:

    def delta(start, end, delta):
        cur = start
        while cur < end:
            yield cur
            cur += delta

    final_ranges = []
    last_result = None
    i = -1
    for checker_date in delta(
            sorted_ranges_by_start[0]['start'],
            sorted_ranges_by_end[-1]['end'],
            relativedelta(seconds=1)):

        aggregator = []
        for rng in ranges:
            if rng['start'] <= checker_date <= rng['end']:
                aggregator.append(str(rng['id']))

        if len(aggregator) > 0:
            ids = ','.join(set(aggregator))
            if last_result != ids:
                final_ranges.append({})
                last_result = ids
                i += 1

            if ids not in final_ranges[i]:
                final_ranges[i][ids] = []

            final_ranges[i][ids].append(checker_date)

但是正如我所说的,它在大范围内的运行非常缓慢。

But as I said it's working very slow in big ranges.

通过这种方式,请帮助我找到可以在不进行迭代的情况下进行迭代的算法,或者建议以任何方式提高迭代速度(不确定,也许尝试在C上编写此部分,然后嵌入到Python中)

In this way please help me find algorithm that can do it without iteration through month or maybe advice any way to improve iteration speed ( not sure, maybe try to write this part on C and then embed to Python )

谢谢。

推荐答案

我已将其与以下代码一起使用。

I've make it work with the code below.

基本解释是首先在提供的时间段之间(即每次时间段结束时)检测切点。其次,仅在切点之间(而不是期间)进行迭代,并检查它们是否与任何切点重叠,以查看它们在这些切点之间是否处于活动状态。累积活动时间。

Basic explanation is to first detect cut points between provided periods, that is, everytime a period starts on ends. Second, iterate between cutpoints only, not periods, and check if they overlap with any to see if they are active between those cut points. Accumulate active periods.

处理时间取决于切点和时间段的数量,而不是经过的时间。

Process time depends on the number of cutpoints and periods, and not on elapsed time.

from datetime import datetime
from sortedcontainers import SortedSet

periods = [
    (5, datetime(2010, 9, 19, 0, 0, 0),    datetime(2010, 9, 19, 0, 5, 10)),
    (6, datetime(2010, 9, 19, 0, 0, 0),    datetime(2010, 9, 19, 12, 59, 59)),
    (4, datetime(2010, 9, 19, 10, 30, 17), datetime(2010, 9, 19, 20, 20, 59)),
    (6, datetime(2010, 9, 19, 14, 12, 0),  datetime(2010, 9, 19, 23, 59, 59)),
    (5, datetime(2010, 9, 19, 17, 0, 22),  datetime(2010, 9, 19, 19, 14, 20))
]

cutpoints = SortedSet()

for period in periods:
    cutpoints.add(period[1])
    cutpoints.add(period[2])

ranges = []

start_cutpoint = None
for end_cutpoint in cutpoints:

    if not start_cutpoint:  # skip first
        start_cutpoint = end_cutpoint
        continue

    cut_point_active_periods = []

    for period in periods:

        # check if period and cutpoint range overlap
        start_overlap = max(start_cutpoint, period[1])
        end_overlap = min(end_cutpoint, period[2])

        if start_overlap < end_overlap:
            cut_point_active_periods.append(period[0])

    ranges.append((cut_point_active_periods, start_cutpoint, end_cutpoint))
    start_cutpoint = end_cutpoint

这篇关于Python中的时间范围重叠算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆