测试列表是否包含某个范围内的数字 [英] test if list contains a number in some range

查看:90
本文介绍了测试列表是否包含某个范围内的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个列表L=[1.1, 1.8, 4.4, 5.2].对于某些整数n,我想知道L是否具有valn-1<val<n+1的值,如果是这样,我想知道val的索引.

Let's say I have a list L=[1.1, 1.8, 4.4, 5.2]. For some integer, n, I want to know whether L has a value val with n-1<val<n+1, and if so I want to know the index of val.

到目前为止,我能做的最好的就是定义一个生成器

The best I can do so far is to define a generator

x = (index for index,val in enumerate(L) if n-1<val<n+1)

,然后使用try... except检查它是否具有适当的值.因此,假设我正在寻找存在这样一个值的最小n> = 0……

and check to see whether it has an appropriate value using try... except. So let's assume I'm looking for the smallest n>=0 for which such a value exists...

L=[1.1, 1.8, 4.4, 5.2]
n=0
while True:
    x = (index for index,val in enumerate(L) if n-1<val<n+1)
    try:
        index=next(x)
        break
    except StopIteration:
        n+=1
print n,index

1 0

1 0

实际上,我正在做更复杂的任务.我希望能够取n,找到第一个索引,如果不存在,则需要做其他事情.

In reality, I'm doing a more complicated task. I'll want to be able to take an n, find the first index, and if it doesn't exist, I need to do something else.

对我来说,这似乎不是特别干净的代码.有没有更好的办法?我觉得numpy可能有答案,但是我还不太了解.

This doesn't seem like particularly clean code to me. Is there a better way? I feel like numpy probably has the answer, but I don't know it well enough.

推荐答案

如果对L进行了排序,则可以使用bisect.bisect_left查找所有L [< i]< n< =全部L [> = i].

If L is sorted, you could use bisect.bisect_left to find the index i for which all L[< i] < n <= all L[>= i].

然后

if n - L[i-1] < 1.0:
    val = L[i-1]
elif L[i] - n < 1.0:
    val = L[i]
else:
    val = None     # no such value found


编辑:根据您的数据,要完成的工作以及要花费多少时间编写一个聪明的算法,对进行排序为您提供良好的解决方案;并且在我看到更多O(n)波动之前,我想指出他的实际问题似乎涉及反复探测n的各种值-这将很快摊销初始排序开销-并且他的建议上面的算法实际上是O(n ** 2).


Depending on your data, what you want to accomplish, and how much time you want to spend writing a clever algorithm, sorting may or may not be a good solution for you; and before I see too many more O(n)s waved around, I would like to point out that his actual problem seems to involve repeatedly probing for various values of n - which would pretty rapidly amortize the initial sorting overhead - and that his suggested algorithm above is actually O(n**2).

@AntoinePelisse:一定要做一些分析:

@AntoinePelisse: by all means, let's do some profiling:

from bisect import bisect_left, bisect_right
from functools import partial
import matplotlib.pyplot as plt
from random import randint, uniform
from timeit import timeit

#blues    
density_col_lin = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.176, 0.600, 1.000),
    (0.357, 0.357, 0.698, 1.000),
    (0.537, 0.537, 0.800, 1.000)
]

# greens
density_col_sor = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.600, 0.176, 1.000),
    (0.357, 0.698, 0.357, 1.000),
    (0.537, 0.800, 0.537, 1.000)
]

def make_data(length, density):
    max_ = length / density
    return [uniform(0.0, max_) for _ in range(length)], max_

def linear_probe(L, max_, probes):
    for p in range(probes):
        n = randint(0, int(max_))
        for index,val in enumerate(L):
            if n - 1.0 < val < n + 1.0:
                # return index
                break

def sorted_probe(L, max_, probes):
    # initial sort
    sL = sorted((val,index) for index,val in enumerate(L))
    for p in range(probes):
        n = randint(0, int(max_))
        left  = bisect_right(sL, (n - 1.0, max_))
        right = bisect_left (sL, (n + 1.0, 0.0 ), left)
        if left < right:
            index = min(sL[left:right], key=lambda s:s[1])[1]
            # return index

def main():
    densities = [0.8, 0.2, 0.08, 0.02]
    probes    = [1, 3, 10, 30, 100]
    lengths   = [[]                   for d in densities]
    lin_pts   = [[[] for p in probes] for d in densities]
    sor_pts   = [[[] for p in probes] for d in densities]

    # time each function at various data lengths, densities, and probe repetitions
    for d,density in enumerate(densities):
        for trial in range(200):
            print("{}-{}".format(density, trial))

             # length in 10 to 5000, with log density
            length = int(10 ** uniform(1.0, 3.699))
            L, max_ = make_data(length, density)
            lengths[d].append(length)

            for p,probe in enumerate(probes):
                lin = timeit(partial(linear_probe, L, max_, probe), number=5) / 5
                sor = timeit(partial(sorted_probe, L, max_, probe), number=5) / 5
                lin_pts[d][p].append(lin / probe)
                sor_pts[d][p].append(sor / probe)

    # plot the results
    plt.figure(figsize=(9.,6.))
    plt.axis([0, 5000, 0, 0.004])

    for d,density in enumerate(densities):
        xs = lengths[d]
        lcol = density_col_lin[d]
        scol = density_col_sor[d]

        for p,probe in enumerate(probes):
            plt.plot(xs, lin_pts[d][p], "o", color=lcol, markersize=4.0)
            plt.plot(xs, sor_pts[d][p], "o", color=scol, markersize=4.0)

    plt.show()

if __name__ == "__main__":
    main()

结果

x轴是L中的项目数,y轴是每个探针的摊销时间;绿点是sorted_probe(),蓝点是linear_probe().

x-axis is number of items in L, y-axis is amortized time per probe; green dots are sorted_probe(), blue are linear_probe().

结论:

  • 这两个函数的运行时间在长度方面都非常线性
  • 对于L的单个探针,预排序比迭代慢约4倍
  • 交叉点似乎大约有5个探针;少于此数,线性搜索更快,更多则预排序更快.
  • runtimes for both functions are remarkably linear with respect to length
  • for a single probe into L, presorting is about 4 times slower than iterating
  • the crossover point seems to be about 5 probes; for fewer than that, linear search is faster, for more, presorting is faster.

这篇关于测试列表是否包含某个范围内的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆