Python-查找最接近的时间戳 [英] Python - Locating the closest timestamp

查看:917
本文介绍了Python-查找最接近的时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python日期时间时间戳和一个较大的dict(索引),其中键是时间戳,值是我感兴趣的其他信息。

I have a Python datetime timestamp and a large dict (index) where keys are timestamps and the values are some other information I'm interested in.

我需要以尽可能有效地在索引中找到最接近时间戳的日期时间(关键字)。

I need to find the datetime (the key) in index that is closest to timestamp, as efficiently as possible.

此刻我正在执行以下操作:

At the moment I'm doing something like:

for timestamp in timestamps:
    closestTimestamp = min(index,key=lambda datetime : abs(timestamp - datetime))

可行,但是花费的时间太长-我的索引字典具有数百万个值,并且我进行了数千次搜索。我对数据结构等很灵活-时间戳大致是顺序的,因此我要从第一个时间戳到最后一个时间戳进行迭代。同样,我加载到dict中的文本文件中的时间戳是连续的。

which works, but takes too long - my index dict has millions of values, and I'm doing the search thousands of times. I'm flexible with data structures and so on - the timestamps are roughly sequential, so that I'm iterating from the first to the last timestamps. Likewise the timestamps in the text file that I load into the dict are sequential.

任何优化想法都将不胜感激。

Any ideas for optimisation would be greatly appreciated.

推荐答案

字典的组织方式不适合有效的未命中搜索。它们是为精确匹配而设计的(使用哈希表)。

Dictionaries aren't organized for efficient near miss searches. They are designed for exact matches (using a hash table).

您最好保留一个单独的,可快速搜索的有序结构。

You may be better-off maintaining a separate, fast-searchable ordered structure.

一种简单的开始方法是使用 bisect模块用于快速O(log N)搜索,但O(n)插入较慢:

A simple way to start off is to use the bisect module for fast O(log N) searches but slower O(n) insertions:

def nearest(ts):
    # Given a presorted list of timestamps:  s = sorted(index)
    i = bisect_left(s, ts)
    return min(s[max(0, i-1): i+2], key=lambda t: abs(ts - t))

更适用于非静态,动态更新的字典的更复杂方法是​​使用 blist ,它采用树形结构进行快速O(log N)插入和查找。仅在字典将随着时间变化时才需要使用。

A more sophisticated approach suitable for non-static, dynamically updated dicts, would be to use blist which employs a tree structure for fast O(log N) insertions and lookups. You only need this if the dict is going to change over time.

如果您希望使用基于字典的方法,请考虑将条目聚类的列表字典带有附近时间戳:

If you want to stay with a dictionary based approach, consider a dict-of-lists that clusters entries with nearby timestamps:

 def get_closest_stamp(ts):
      'Speed-up timestamp search by looking only at entries in the same hour'
      hour = round_to_nearest_hour(ts)
      cluster = daydict[hour]         # return a list of entries
      return min(cluster, key=lambda t: abs(ts - t))

请注意,要获得接近群集边界的准确结果,请在两者中存储接近边界的时间戳主群集和相邻群集。

Note, for exact results near cluster boundaries, store close-to-the-boundary timestamps in both the primary cluster and the adjacent cluster.

这篇关于Python-查找最接近的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆