散列一系列值 [英] Hash a Range of Values

查看:143
本文介绍了散列一系列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我可以将奇异值散列为 dict 中的键。例如,我可以将 5 作为 dict 中的一个键。



我目前面临一个问题,需要我散列一个值范围。



基本上,我需要一个更快的方法来做到这一点:

 如果0 <= x <= 0.1:
#f(A)
elif 0.1 < =x≤0.2:
#f(B)
elif 0.2 <= x <= 0.3:
#f(C)
elif 0.3 < = x< = 0.4:
#f(D)
elif 0.4 <= x <= 0.5:
#f(E)
elif 0.5 <= x < = 0.6:
#f(F)

其中 x 是一些 float 任意精度的参数。



我能想到的最快方法是哈希,但是这里有个问题:我可以使用(0.1,0.2)作为关键,但这仍然会耗费我O(n)运行时间,并且最终没有比 elif s更好的了(我必须遍历键并检查 key [0] <= x )。



有没有办法对一系列值进行散列,以便我可以检查散列表 0.15 仍然得到 #execute B



如果这样的哈希isn不可能,我还能怎样才能改善这种运行时间?我正在处理线程运行时速度不够快的足够大的数据集。



编辑:为回应cheeken的回答,我必须注意间隔不能被认为是正常的。事实上,我几乎可以保证他们不是

为了回应评论中的要求,我应该提到我这样做是为了实现<基因遗传算法中的基于健身的选择。该算法本身是作业,但具体的实现只是为了改善运行时生成实验数据。

注意到,你将得到的最好的算法是O(log N),而不是O(1),沿着通过排序列表进行二等分搜索的结果。



在Python中执行此操作的最简单方法是使用 bisect 标准模块 http://docs.python.org/library/bisect.html 。请特别注意,那里的第8.5.2节中的示例,在做数字表查找时 - 这正是您正在做的:

 >>> def等级(分数,断点= [60,70,80,90],等级='FDCBA'):
... i =二等分(断点,分数)
...返回等级[i ]
...
>>> [33,99,77,70,89,90,100]中的得分[等级(分数)]
['F','A','C','C','B',' A','A']

替换等级带有函数列表的字符串, breakpoints 列表中包含您的下限阈值列表,然后您就可以开始了。


I know that I can hash singular values as keys in a dict. For example, I can hash 5 as one of the keys in a dict.

I am currently facing a problem that requires me to hash a range of values.

Basically, I need a faster way to to do this:

if 0 <= x <= 0.1:
    # f(A)
elif 0.1 <= x <= 0.2:
    # f(B)
elif 0.2 <= x <= 0.3:
    # f(C)
elif 0.3 <= x <= 0.4:
    # f(D)
elif 0.4 <= x <= 0.5:
    # f(E)
elif 0.5 <= x <= 0.6:
    # f(F)

where x is some float parameter of arbitrary precision.

The fastest way I can think of is hashing, but here's the problem: I can use (0.1, 0.2) as a key, but that still is going to cost me O(n) runtime and is ultimately no better than the slew of elifs (I would have to iterate over the keys and check to see if key[0] <= x <= key[1]).

Is there a way to hash a range of values so that I can check the hash table for0.15 and still get #execute B?

If such a hashing isn't possible, how else might I be able to improve the runtime of this? I am working with large enough data sets that linear runtime is not fast enough.

EDIT: In response to cheeken's answer, I must note that the intervals cannot be assumed to be regular. As a matter of fact, I can almost guarantee that they are not

In response to requests in comments, I should mention that I am doing this in an attempt to implement fitness-based selection in a genetic algorithm. The algorithm itself is for homework, but the specific implementation is only to improve the runtime for generating experimental data.

解决方案

As others have noted, the best algorithm you're going to get for this is something that's O(log N), not O(1), with something along the lines of a bisection search through a sorted list.

The easiest way to do this in Python is with the bisect standard module, http://docs.python.org/library/bisect.html. Note, in particular, the example in section 8.5.2 there, on doing numeric table lookups -- it's exactly what you are doing:

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Replace the grades string with a list of functions, the breakpoints list with your list of lower thresholds, and there you go.

这篇关于散列一系列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆