如何有效地计算一组间隔中一组数字的存在 [英] How to count the presence of a set of numbers in a set of intervals efficiently
问题描述
输入参数是表示间隔的元组列表和整数列表.目标是编写一个函数,该函数计算每个整数存在的间隔数,并将此结果作为关联数组返回.例如:
Input intervals: [(1, 3), (5, 6), (6, 9)]
Input integers: [2, 4, 6, 8]
Output: {2: 1, 4: 0, 6: 2, 8: 1}
其他示例:
Input intervals: [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
Input numbers: [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
Output: {2: 2, 4: 2, 6: 5, 7: 6, 10: 7, 15: 6, 16: 6, 17: 8, 18: 8, 20: 9, 21: 9, 23: 11, 24: 12, 27: 11, 28: 9, 29: 8, 31: 7, 32: 6, 39: 2, 40: 2}
我将如何编写一个有效执行此功能的函数?我已经有了O(nm)实现,其中n个间隔数和m个整数数,但是我正在寻找更有效的方法.
我现在有什么:
def intervals_per_number(numbers, intervals):
result_map = {i: 0 for i in numbers}
for i in result_map.keys():
for k in intervals:
if k[0] <= i <= k[1]:
result_map[i] += 1
return result_map
希望我解释得足够好.让我知道是否还有不清楚的地方.
谢谢.
将整数,起点和终点放在单个对中.将每对的第一个元素设为整数,起点或终点的值,并将每对的第二个元素设为0,-1或1,具体取决于它是整数,起点还是终点. /p>
下一步,对列表进行排序.
现在,您可以遍历列表,保持该对中第二个元素的连续总和.当您看到一对第二个元素为0的对时,记录该整数的运行总和(取反).
在最坏的情况下,这会以O((N + M)log(N + M))的时间运行(实际上,我想如果查询和间隔大多是排序的,那将是线性的,这要归功于timsort).
例如:
Input intervals: [(1, 3), (5, 6), (6, 9)]
Input integers: [2, 4, 6, 8]
Unified list (sorted):
[(1,-1), (2,0), (3,1), (4,0), (5,-1), (6, -1), (6,0), (6,1), (8,0), (9,1)]
Running sum:
[-1 , -1, 0, 0, -1, -2, 0, -1, -1, 0]
Values for integers:
2: 1, 4: 0, 6: 2, 8, 1
示例代码:
def query(qs, intervals):
xs = [(q, 0) for q in qs] + [(x, -1) for x, _ in intervals] + [(x, 1) for _, x in intervals]
S, r = 0, dict()
for v, s in sorted(xs):
if s == 0:
r[v] = S
S -= s
return r
intervals = [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
queries = [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
print(query(queries, intervals))
输出:
{2: 2, 4: 2, 6: 5, 7: 6, 10: 7, 15: 6, 16: 6, 17: 8, 18: 8, 20: 9, 21: 9, 23: 11, 24: 12, 27: 11, 28: 9, 29: 8, 31: 7, 32: 6, 39: 2, 40: 2}
The input parameters are a list of tuples representing the intervals and a list of integers. The goal is to write a function that counts the number of intervals each integer is present in and return this result as a associative array. So for example:
Input intervals: [(1, 3), (5, 6), (6, 9)]
Input integers: [2, 4, 6, 8]
Output: {2: 1, 4: 0, 6: 2, 8: 1}
Other example:
Input intervals: [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
Input numbers: [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
Output: {2: 2, 4: 2, 6: 5, 7: 6, 10: 7, 15: 6, 16: 6, 17: 8, 18: 8, 20: 9, 21: 9, 23: 11, 24: 12, 27: 11, 28: 9, 29: 8, 31: 7, 32: 6, 39: 2, 40: 2}
How would I go about writing a function that does this efficiently? I already have the O(nm) implementation with n the number of intervals and m the number of integers but I'm looking for something more efficient.
What I have at the moment:
def intervals_per_number(numbers, intervals):
result_map = {i: 0 for i in numbers}
for i in result_map.keys():
for k in intervals:
if k[0] <= i <= k[1]:
result_map[i] += 1
return result_map
Hope I explained it well enough. Let me know if anything's still unclear.
Thanks in advance.
Put your integers, start points, and end points in a single list of pairs. Make the first element of each pair the value of the integer, start point, or end point, and the second element of each pair be 0, -1, or 1 depending on whether it's an integer, start point, or end point.
Next, sort the list.
Now, you can go through the list, maintaining a running sum of the second elements of the pairs. When you see a pair with second element 0, record the running sum (negated) for that integer.
This runs in O((N+M)log(N+M)) time in the worst case (and in practice I guess it'll be linear if the queries and intervals are mostly sorted, thanks to timsort).
For example:
Input intervals: [(1, 3), (5, 6), (6, 9)]
Input integers: [2, 4, 6, 8]
Unified list (sorted):
[(1,-1), (2,0), (3,1), (4,0), (5,-1), (6, -1), (6,0), (6,1), (8,0), (9,1)]
Running sum:
[-1 , -1, 0, 0, -1, -2, 0, -1, -1, 0]
Values for integers:
2: 1, 4: 0, 6: 2, 8, 1
Example code:
def query(qs, intervals):
xs = [(q, 0) for q in qs] + [(x, -1) for x, _ in intervals] + [(x, 1) for _, x in intervals]
S, r = 0, dict()
for v, s in sorted(xs):
if s == 0:
r[v] = S
S -= s
return r
intervals = [(3, 3), (22, 30), (17, 29), (7, 12), (12, 34), (18, 38), (30, 40), (5, 27), (19, 26), (27, 27), (1, 31), (17, 17), (22, 25), (6, 14), (5, 7), (9, 19), (24, 28), (19, 40), (9, 36), (2, 32)]
queries = [16, 18, 39, 40, 27, 28, 4, 23, 15, 24, 2, 6, 32, 17, 21, 29, 31, 7, 20, 10]
print(query(queries, intervals))
Output:
{2: 2, 4: 2, 6: 5, 7: 6, 10: 7, 15: 6, 16: 6, 17: 8, 18: 8, 20: 9, 21: 9, 23: 11, 24: 12, 27: 11, 28: 9, 29: 8, 31: 7, 32: 6, 39: 2, 40: 2}
这篇关于如何有效地计算一组间隔中一组数字的存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!