散列一系列值 [英] Hash a Range of Values
问题描述
我知道我可以将奇异值散列为 dict
中的键。例如,我可以将 5
作为 dict
中的一个键。
我目前面临一个问题,需要我散列一个值范围。
基本上,我需要一个更快的方法来做到这一点:
如果0 <= x <= 0.1:
#f(A)
elif 0.1 < =x≤0.2:
#f(B)
elif 0.2 <= x <= 0.3:
#f(C)
elif 0.3 < = x< = 0.4:
#f(D)
elif 0.4 <= x <= 0.5:
#f(E)
elif 0.5 <= x < = 0.6:
#f(F)
其中 x
是一些 float
任意精度的参数。
我能想到的最快方法是哈希,但是这里有个问题:我可以使用(0.1,0.2)
作为关键,但这仍然会耗费我O(n)运行时间,并且最终没有比 elif
s更好的了(我必须遍历键并检查 key [0] <= x
有没有办法对一系列值进行散列,以便我可以检查散列表 0.15
仍然得到 #execute B
?
如果这样的哈希isn不可能,我还能怎样才能改善这种运行时间?我正在处理线程运行时速度不够快的足够大的数据集。
编辑:为回应cheeken的回答,我必须注意间隔不能被认为是正常的。事实上,我几乎可以保证他们不是
为了回应评论中的要求,我应该提到我这样做是为了实现<基因遗传算法中的基于健身的选择。该算法本身是作业,但具体的实现只是为了改善运行时生成实验数据。
注意到,你将得到的最好的算法是O(log N),而不是O(1),沿着通过排序列表进行二等分搜索的结果。
在Python中执行此操作的最简单方法是使用 bisect
标准模块 http://docs.python.org/library/bisect.html 。请特别注意,那里的第8.5.2节中的示例,在做数字表查找时 - 这正是您正在做的:
>>> def等级(分数,断点= [60,70,80,90],等级='FDCBA'):
... i =二等分(断点,分数)
...返回等级[i ]
...
>>> [33,99,77,70,89,90,100]中的得分[等级(分数)]
['F','A','C','C','B',' A','A']
替换等级
带有函数列表的字符串, breakpoints
列表中包含您的下限阈值列表,然后您就可以开始了。
I know that I can hash singular values as keys in a dict
. For example, I can hash 5
as one of the keys in a dict
.
I am currently facing a problem that requires me to hash a range of values.
Basically, I need a faster way to to do this:
if 0 <= x <= 0.1:
# f(A)
elif 0.1 <= x <= 0.2:
# f(B)
elif 0.2 <= x <= 0.3:
# f(C)
elif 0.3 <= x <= 0.4:
# f(D)
elif 0.4 <= x <= 0.5:
# f(E)
elif 0.5 <= x <= 0.6:
# f(F)
where x
is some float
parameter of arbitrary precision.
The fastest way I can think of is hashing, but here's the problem: I can use (0.1, 0.2)
as a key, but that still is going to cost me O(n) runtime and is ultimately no better than the slew of elif
s (I would have to iterate over the keys and check to see if key[0] <= x <= key[1]
).
Is there a way to hash a range of values so that I can check the hash table for0.15
and still get #execute B
?
If such a hashing isn't possible, how else might I be able to improve the runtime of this? I am working with large enough data sets that linear runtime is not fast enough.
EDIT: In response to cheeken's answer, I must note that the intervals cannot be assumed to be regular. As a matter of fact, I can almost guarantee that they are not
In response to requests in comments, I should mention that I am doing this in an attempt to implement fitness-based selection in a genetic algorithm. The algorithm itself is for homework, but the specific implementation is only to improve the runtime for generating experimental data.
As others have noted, the best algorithm you're going to get for this is something that's O(log N), not O(1), with something along the lines of a bisection search through a sorted list.
The easiest way to do this in Python is with the bisect
standard module, http://docs.python.org/library/bisect.html. Note, in particular, the example in section 8.5.2 there, on doing numeric table lookups -- it's exactly what you are doing:
>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
... i = bisect(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']
Replace the grades
string with a list of functions, the breakpoints
list with your list of lower thresholds, and there you go.
这篇关于散列一系列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!