测试列表是否包含某个范围内的数字 [英] test if list contains a number in some range
问题描述
假设我有一个列表L=[1.1, 1.8, 4.4, 5.2]
.对于某些整数n
,我想知道L
是否具有val
和n-1<val<n+1
的值,如果是这样,我想知道val
的索引.
Let's say I have a list L=[1.1, 1.8, 4.4, 5.2]
. For some integer, n
, I want to know whether L
has a value val
with n-1<val<n+1
, and if so I want to know the index of val
.
到目前为止,我能做的最好的就是定义一个生成器
The best I can do so far is to define a generator
x = (index for index,val in enumerate(L) if n-1<val<n+1)
,然后使用try... except
检查它是否具有适当的值.因此,假设我正在寻找存在这样一个值的最小n> = 0……
and check to see whether it has an appropriate value using try... except
. So let's assume I'm looking for the smallest n>=0 for which such a value exists...
L=[1.1, 1.8, 4.4, 5.2]
n=0
while True:
x = (index for index,val in enumerate(L) if n-1<val<n+1)
try:
index=next(x)
break
except StopIteration:
n+=1
print n,index
1 0
1 0
实际上,我正在做更复杂的任务.我希望能够取n,找到第一个索引,如果不存在,则需要做其他事情.
In reality, I'm doing a more complicated task. I'll want to be able to take an n, find the first index, and if it doesn't exist, I need to do something else.
对我来说,这似乎不是特别干净的代码.有没有更好的办法?我觉得numpy可能有答案,但是我还不太了解.
This doesn't seem like particularly clean code to me. Is there a better way? I feel like numpy probably has the answer, but I don't know it well enough.
推荐答案
如果对L进行了排序,则可以使用bisect.bisect_left
查找所有L [< i]< n< =全部L [> = i].
If L is sorted, you could use bisect.bisect_left
to find the index i for which all L[< i] < n <= all L[>= i].
然后
if n - L[i-1] < 1.0:
val = L[i-1]
elif L[i] - n < 1.0:
val = L[i]
else:
val = None # no such value found
编辑:根据您的数据,要完成的工作以及要花费多少时间编写一个聪明的算法,对进行排序为您提供良好的解决方案;并且在我看到更多O(n)波动之前,我想指出他的实际问题似乎涉及反复探测n的各种值-这将很快摊销初始排序开销-并且他的建议上面的算法实际上是O(n ** 2).
Depending on your data, what you want to accomplish, and how much time you want to spend writing a clever algorithm, sorting may or may not be a good solution for you; and before I see too many more O(n)s waved around, I would like to point out that his actual problem seems to involve repeatedly probing for various values of n - which would pretty rapidly amortize the initial sorting overhead - and that his suggested algorithm above is actually O(n**2).
@AntoinePelisse:一定要做一些分析:
@AntoinePelisse: by all means, let's do some profiling:
from bisect import bisect_left, bisect_right
from functools import partial
import matplotlib.pyplot as plt
from random import randint, uniform
from timeit import timeit
#blues
density_col_lin = [
(0.000, 0.502, 0.000, 1.000),
(0.176, 0.176, 0.600, 1.000),
(0.357, 0.357, 0.698, 1.000),
(0.537, 0.537, 0.800, 1.000)
]
# greens
density_col_sor = [
(0.000, 0.502, 0.000, 1.000),
(0.176, 0.600, 0.176, 1.000),
(0.357, 0.698, 0.357, 1.000),
(0.537, 0.800, 0.537, 1.000)
]
def make_data(length, density):
max_ = length / density
return [uniform(0.0, max_) for _ in range(length)], max_
def linear_probe(L, max_, probes):
for p in range(probes):
n = randint(0, int(max_))
for index,val in enumerate(L):
if n - 1.0 < val < n + 1.0:
# return index
break
def sorted_probe(L, max_, probes):
# initial sort
sL = sorted((val,index) for index,val in enumerate(L))
for p in range(probes):
n = randint(0, int(max_))
left = bisect_right(sL, (n - 1.0, max_))
right = bisect_left (sL, (n + 1.0, 0.0 ), left)
if left < right:
index = min(sL[left:right], key=lambda s:s[1])[1]
# return index
def main():
densities = [0.8, 0.2, 0.08, 0.02]
probes = [1, 3, 10, 30, 100]
lengths = [[] for d in densities]
lin_pts = [[[] for p in probes] for d in densities]
sor_pts = [[[] for p in probes] for d in densities]
# time each function at various data lengths, densities, and probe repetitions
for d,density in enumerate(densities):
for trial in range(200):
print("{}-{}".format(density, trial))
# length in 10 to 5000, with log density
length = int(10 ** uniform(1.0, 3.699))
L, max_ = make_data(length, density)
lengths[d].append(length)
for p,probe in enumerate(probes):
lin = timeit(partial(linear_probe, L, max_, probe), number=5) / 5
sor = timeit(partial(sorted_probe, L, max_, probe), number=5) / 5
lin_pts[d][p].append(lin / probe)
sor_pts[d][p].append(sor / probe)
# plot the results
plt.figure(figsize=(9.,6.))
plt.axis([0, 5000, 0, 0.004])
for d,density in enumerate(densities):
xs = lengths[d]
lcol = density_col_lin[d]
scol = density_col_sor[d]
for p,probe in enumerate(probes):
plt.plot(xs, lin_pts[d][p], "o", color=lcol, markersize=4.0)
plt.plot(xs, sor_pts[d][p], "o", color=scol, markersize=4.0)
plt.show()
if __name__ == "__main__":
main()
结果
x轴是L中的项目数,y轴是每个探针的摊销时间;绿点是sorted_probe(),蓝点是linear_probe().
x-axis is number of items in L, y-axis is amortized time per probe; green dots are sorted_probe(), blue are linear_probe().
结论:
- 这两个函数的运行时间在长度方面都非常线性
- 对于L的单个探针,预排序比迭代慢约4倍 交叉点似乎大约有5个探针;少于此数,线性搜索更快,更多则预排序更快.
- runtimes for both functions are remarkably linear with respect to length
- for a single probe into L, presorting is about 4 times slower than iterating
- the crossover point seems to be about 5 probes; for fewer than that, linear search is faster, for more, presorting is faster.
这篇关于测试列表是否包含某个范围内的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!