使用范围作为字典中的键值,最有效的方法是? [英] Use range as a key value in a dictionary, most efficient way?

查看:81
本文介绍了使用范围作为字典中的键值,最有效的方法是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直想知道是否存在某种数据结构或巧妙的方法来使用字典(O(1)查找)来返回值,如果给定范围的给定值不重叠。到目前为止,我一直认为如果范围具有一定的常数差(0-2、2-4、4-6等)可以执行此操作,或者可以在O(log(n ))时间。



因此,例如给定词典,

  d = {[[0.0-0.1): a,
[0.1-0.3): b,
[0.3-0.55): c,
[0.55-0.7 ): d,
[0.7-1.0): e}

it应该返回,

  d [0.05] 
>> a
d [0.8]
>> e
d [0.9]
>> e
d [random.random()]#这也应该起作用

无论如何要实现这样的目标?谢谢您对此的任何答复或回答。

解决方案

如果您接受较低的(er



字典可以在O(1)平均时间中进行查找,因为键和之间存在简单的算术关系。固定大小的数据结构中的位置(对于一般情况,为 hash(key)%tablesize )。您的范围实际上是具有浮点边界的可变大小,因此没有固定的表大小可将搜索值映射到该值。



除非,即,限制范围的绝对上下边界,并使范围边界落在固定的步长上。您的示例使用从0.0到1.0的值,并且范围可以量化为0.05级。可以将其变成固定的表:

 从集合中导入数学
导入MutableMapping

#空槽标记
_EMPTY = object()

class RangeMap(MutableMapping):
映射指向值,并在O(1)中查找点的值恒定时间

地图要求
范围具有固定的最小下限和最大上限。范围边界被量化为固定的步长。设置重叠时允许间隙



def __init __(self,map = None,lower = 0.0,upper = 1.0,step = 0.05):
self._mag = 10 ** -round(math.log10(step)-1)#转换为整数
self._lower,self._upper = round(lower * self._mag),round(upper * self._mag)
self._step = round(step * self._mag)
self._steps =(self._upper-self._lower)// self._step
self._table = [_EMPTY] * self ._steps
self._len = 0
如果map不是None:
self.update(map)

def __len __(self):
return self._len

def _map_range(self,r):
低,高= r
start = round(low * self._mag)// self._step
stop = round(high * self._mag)/ / self._step
,如果不是self._lower< =开始< stop< = self._upper:
提高IndexError('超出地图边界的范围')
返回范围(start-self._lower,stop-self._lower)

def __setitem __(self,r,value):
for i in self._map_range(r):
self._len + = int(self._table [i]是_EMPTY)
self。 _table [i] =值

def __delitem __(self,r):
for i in self._map_range(r):
self._len-= int(self._table [i]不是_EMPTY)
self._table [i] = _EMPTY

def _point_to_index(self,point):
point = round(point * self._mag)
如果不是self._lower< =点< = self._upper:
提高IndexError('地图边界之外的点')
return(point-self._lower)// self。 _step

def __getitem __(self,point_or_range):
if isinstance(point_or_range,tuple):
低,高= point_or_range
r = self._map _range(point_or_range)
#范围中的所有点都必须指向相同的值
value = self._table [r [0]]
(如果值是_EMPTY或any(self._table [ i]!= i在r中的值):
提高IndexError('不是单个值的范围')
否则:
value = self._table [self._point_to_index(point_or_range) ]
如果值是_EMPTY:
提高IndexError('点不在地图中')
返回值

def __iter __(self):
低=无
的值= _EMPTY
的in,enumerate(self_table)中的v:
pos =(self._lower +(i * self._step))/ self._mag
如果v为_EMPTY:
如果低不为空无:
收益率(低,pos)
低=无
elif v!=值:
如果低为不无:
收益(低,pos)
l ow = pos
值= v
(如果不为低则无):
收益(低,self._upper / self._mag)

以上实现了完整的映射接口,并接受了点和范围(作为对 [start,stop]进行建模的元组)间隔)在建立索引或测试包含性时(支持范围使重用默认键,值和项字典视图实现更加容易,所有这些都可以从 __ iter __ 实现中使用



Demo:

 >> d = RangeMap({
...(0.0,0.1): a,
...(0.1,0.3): b,
...(0.3,0.55 ): c,
...(0.55,0.7): d,
...(0.7,1.0): e,
...})
>> print(* d.items(),sep ='\n')
((0.0,0.1),'a')
((0.1,0.3),'b')
((0.3,0.55),'c')
((0.55,0.7),'d')
((0.7,1.0),'e')
> > d [0.05]
‘a’
>>> d [0.8]
’e’
>>> d [0.9]
’e’
>>>导入随机
>>> d [random.random()]
’c’
>>> d [random.random()]
'a'

如果您可以如此轻松地限制步长和边界,那么您的下一个最佳选择是使用某种二进制搜索算法;您将范围保持在已排序的顺序,并在数据结构的中间选择一个点;根据您的搜索关键字高于或低于该中点,您将继续在数据结构的一半进行搜索,直到找到匹配项。



如果您的范围覆盖从最低边界到最高边界的完整间隔,则可以使用 bisect 模块;只需将每个范围的上下边界存储在一个列表中,将相应的值存储在另一个列表中,然后使用二分法将第一个列表中的位置映射到第二个列表中的结果。



如果您的范围具有空白,则您需要保留带有其他边界的第三个列表,并首先验证该点是否在范围内,或使用间隔树。对于非重叠范围,可以使用简单的二叉树,但是也有更专业的实现也支持重叠范围。在PyPI上有一个 intervaltree 项目,支持全间隔树操作。



A bisect 基于映射的行为与固定表实现相​​匹配像:

 从bisect导入bisect_left 
从collections.abc导入MutableMapping


类RangeBisection(MutableMapping):
映射范围为值

查找是在O(logN)时间完成的。没有上限或
的限制


def __init __(self,map = None):
self._upper = []
self._lower = []
self._values = []
如果map不是None:
self.update(map)

def __len __(self) :
return len(self._values)

def __getitem __(self,point_or_range):
如果isinstance(point_or_range,tuple):
低,高= point_or_range
i = bisect_left(self._upper,high)
点=低
否则:
point = point_or_range
i = bisect_left(self._upper,point)如果我> = len(self._values)或self._lower [i]>点:$ b​​ $ b提高IndexError(point_or_range)
返回self._values [i]

def __setitem __(self,r,value):
较低,较高= r
i = bisect_left(self._upper,upper)
如果i< len(self._values)和self._lower [i]< upper:
提高IndexError('不允许重叠')
self._upper.insert(i,上层)
self._lower.insert(i,下层)
self._values .insert(i,value)

def __delitem __(self,r):
lower,upper = r
i = bisect_left(self._upper,upper)
if self._upper [i]!=上或self._lower [i]!=下:
提高IndexError('范围不在地图中')
del self._upper [i]
del self._lower [i]
del self._values [i]

def __iter __(self):zip(self._lower,self._upper)
的收益b


I have been wondering if there is some kind of data-structure or clever way to use a dictionary (O(1) lookup) to return a value if there are given values for defined ranges that do not overlap. So far I have been thinking this could be done if the ranges have some constant difference (0-2, 2-4, 4-6, etc.) or a binary-search could be done to do this in O(log(n)) time.

So, for example given a dictionary,

d = {[0.0 - 0.1): "a",
     [0.1 - 0.3): "b",
     [0.3 - 0.55): "c",
     [0.55 - 0.7): "d",
     [0.7 - 1.0): "e"}

it should return,

d[0.05] 
>>> "a"
d[0.8]
>>> "e"
d[0.9]
>>> "e"
d[random.random()] # this should also work

Is there anyway to achieve something like this? Thanks for any responses or answers on this.

解决方案

You can have O(1) lookup time if you accept a low(er) resolution of range boundaries and sacrifice memory for lookup speed.

A dictionary can do a lookup in O(1) average time because there is a simple arithmetic relationship between key and location in a fixed-size data structure (hash(key) % tablesize, for the average case). Your ranges are effectively of a variable size with floating-point boundaries, so there is no fixed tablesize to map a search value to.

Unless, that is, you limit the absolute lower and upper boundaries of the ranges, and let range boundaries fall on a fixed step size. Your example uses values from 0.0 through to 1.0, and the ranges can be quantized to 0.05 steps. That can be turned into a fixed table:

import math
from collections import MutableMapping

# empty slot marker
_EMPTY = object()

class RangeMap(MutableMapping):
    """Map points to values, and find values for points in O(1) constant time

    The map requires a fixed minimum lower and maximum upper bound for
    the ranges. Range boundaries are quantized to a fixed step size. Gaps
    are permitted, when setting overlapping ranges last range set wins.

    """
    def __init__(self, map=None, lower=0.0, upper=1.0, step=0.05):
        self._mag = 10 ** -round(math.log10(step) - 1)  # shift to integers
        self._lower, self._upper = round(lower * self._mag), round(upper * self._mag)
        self._step = round(step * self._mag)
        self._steps = (self._upper - self._lower) // self._step
        self._table = [_EMPTY] * self._steps
        self._len = 0
        if map is not None:
            self.update(map)

    def __len__(self):
        return self._len

    def _map_range(self, r):
        low, high = r
        start = round(low * self._mag) // self._step
        stop = round(high * self._mag) // self._step
        if not self._lower <= start < stop <= self._upper:
            raise IndexError('Range outside of map boundaries')
        return range(start - self._lower, stop - self._lower)

    def __setitem__(self, r, value):
        for i in self._map_range(r):
            self._len += int(self._table[i] is _EMPTY)
            self._table[i] = value

    def __delitem__(self, r):
        for i in self._map_range(r):
            self._len -= int(self._table[i] is not _EMPTY)
            self._table[i] = _EMPTY

    def _point_to_index(self, point):
        point = round(point * self._mag)
        if not self._lower <= point <= self._upper:
            raise IndexError('Point outside of map boundaries')
        return (point - self._lower) // self._step

    def __getitem__(self, point_or_range):
        if isinstance(point_or_range, tuple):
            low, high = point_or_range
            r = self._map_range(point_or_range)
            # all points in the range must point to the same value
            value = self._table[r[0]]
            if value is _EMPTY or any(self._table[i] != value for i in r):
                raise IndexError('Not a range for a single value')
        else:
            value = self._table[self._point_to_index(point_or_range)]
            if value is _EMPTY:
                raise IndexError('Point not in map')
        return value

    def __iter__(self):
        low = None
        value = _EMPTY
        for i, v in enumerate(self._table):
            pos = (self._lower + (i * self._step)) / self._mag
            if v is _EMPTY:
                if low is not None:
                    yield (low, pos)
                    low = None
            elif v != value:
                if low is not None:
                    yield (low, pos)
                low = pos
                value = v
        if low is not None:
            yield (low, self._upper / self._mag)

The above implements the full mapping interface, and accepts both points and ranges (as a tuple modelling a [start, stop) interval) when indexing or testing for containment (supporting ranges made it easier to reuse the default keys, values and items dictionary view implementations, which all work from the __iter__ implementation).

Demo:

>>> d = RangeMap({
...     (0.0, 0.1): "a",
...     (0.1, 0.3): "b",
...     (0.3, 0.55): "c",
...     (0.55, 0.7): "d",
...     (0.7, 1.0): "e",
... })
>>> print(*d.items(), sep='\n')
((0.0, 0.1), 'a')
((0.1, 0.3), 'b')
((0.3, 0.55), 'c')
((0.55, 0.7), 'd')
((0.7, 1.0), 'e')
>>> d[0.05]
'a'
>>> d[0.8]
'e'
>>> d[0.9]
'e'
>>> import random
>>> d[random.random()]
'c'
>>> d[random.random()]
'a'

If you can't limit the step size and boundaries so readily, then your next best option is to use some kind of binary search algorithm; you keep the ranges in sorted order and pick a point in the middle of the data structure; based on your search key being higher or lower than that mid point you continue the search in either half of the data structure until you find a match.

If your ranges cover the full interval from lowest to highest boundary, then you can use the bisect module for this; just store either the lower or upper boundaries of each range in one list, the corresponding values in another, and use bisection to map a position in the first list to a result in the second.

If your ranges have gaps, then you either need to keep a third list with the other boundary and first validate that the point falls in the range, or use an interval tree. For non-overlapping ranges a simple binary tree would do, but there are more specialised implementations that support overlapping ranges too. There is a intervaltree project on PyPI that supports full interval tree operations.

A bisect-based mapping that matches behaviour to the fixed-table implementation would look like:

from bisect import bisect_left
from collections.abc import MutableMapping


class RangeBisection(MutableMapping):
    """Map ranges to values

    Lookups are done in O(logN) time. There are no limits set on the upper or
    lower bounds of the ranges, but ranges must not overlap.

    """
    def __init__(self, map=None):
        self._upper = []
        self._lower = []
        self._values = []
        if map is not None:
            self.update(map)

    def __len__(self):
        return len(self._values)

    def __getitem__(self, point_or_range):
        if isinstance(point_or_range, tuple):
            low, high = point_or_range
            i = bisect_left(self._upper, high)
            point = low
        else:
            point = point_or_range
            i = bisect_left(self._upper, point)
        if i >= len(self._values) or self._lower[i] > point:
            raise IndexError(point_or_range)
        return self._values[i]

    def __setitem__(self, r, value):
        lower, upper = r
        i = bisect_left(self._upper, upper)
        if i < len(self._values) and self._lower[i] < upper:
            raise IndexError('No overlaps permitted')
        self._upper.insert(i, upper)
        self._lower.insert(i, lower)
        self._values.insert(i, value)

    def __delitem__(self, r):
        lower, upper = r
        i = bisect_left(self._upper, upper)
        if self._upper[i] != upper or self._lower[i] != lower:
            raise IndexError('Range not in map')
        del self._upper[i]
        del self._lower[i]
        del self._values[i]

    def __iter__(self):
        yield from zip(self._lower, self._upper)

这篇关于使用范围作为字典中的键值,最有效的方法是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆