有效搜索所有元素都大于给定元组的元组 [英] Searching for a tuple with all elements greater than a given tuple efficiently

查看:81
本文介绍了有效搜索所有元素都大于给定元组的元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下元组列表:[(5,4,5),(6,9,6),(3,8,3),(7,9,8)]

我正在尝试设计一种算法,以检查列表中是否存在至少一个元组,其中该元组的所有元素都大于或等于给定的元组(针).

例如,对于给定的元组(6,5,7),算法应返回True,因为给定的元组中的每个元素都小于列表中的最后一个元组,即(7,9,8).但是,对于给定的元组(9,1,9),该算法应返回False,因为列表中没有元组,其中每个元素都大于给定的元组.特别是,这是由于给定元组的第二个元素1小于列表中所有元组的第二个元素.

幼稚的算法将一个遍历列表中的元组,并在内部循环中遍历元组的元素.假设有n个元组,每个元组有m个元素,则复杂度为O(nm).

我在考虑是否有可能使用一种算法来以较低的复杂度生成任务.允许进行预处理或使用任何精美的数据结构来存储数据!

我本来的想法是利用二进制搜索的某种变体,但是一旦我们消除了基于第一个元素的一些元组,我似乎找不到一种数据结构,该数据结构使我们不会退回到朴素的解决方案,这意味着该算法最后也可能为O(nm).

谢谢!

解决方案

考虑此问题的2元组版本.每个元组(x,y)对应于平面上的轴对齐矩形,其右上角为(x,y),右下角为(-oo,+ oo).该集合对应于这些矩形的并集.给定一个查询点(针),我们只需确定它是否在联合中.知道边界就足够了.这是一条与轴对齐的折线,相对于x在y中单调不增加:x方向上的向下楼梯".使用任何合理的数据结构(例如,折线上的x排序的点列表),可以很容易地在n个矩形的O(log n)时间中做出决定.不难看出如何通过一次插入一个矩形来构造O(n log n)时的折线,每个矩形都具有O(log n)的作用.

这里是可视化.这四个点是输入元组.蓝线左下方的区域对应于"True"返回值:

组合A,B,C影响边界.元组D没有.

所以问题是,该2元组版本是否可以很好地泛化为3.因此,半无限轴对齐的矩形的并集变为矩形棱柱的并集.边界折线变为3d曲面.

存在一些表示此类问题的常用方法.一个是作为八叉树.计算八叉树的并集是一种众所周知的标准算法,并且相当有效.查询一个成员资格需要O(log k)时间,其中k是其中包含的最大整数坐标范围.这可能是最简单的选择.但是如果整数域很大,八叉树可能会相对较慢并占用大量空间.

另一个没有这些弱点的候选对象是二进制空间分区,它可以处理任意尺寸.BSP使用维度为n-1的(超)平面来递归拆分n-d空间.一棵树描述了平面的逻辑关系.在此应用程序中,每个元组需要3个平面.由平面引起的真"半空间的交集将是对应于元组的真半无限棱镜.查询一根针正在遍历该树,以确定您是否在任何棱镜内.BSP的平均情况行为非常好,但树的最坏情况大小却很糟糕:在大小为O(2 ^ n)的树上搜索O(n)的时间.在实际的应用程序中,从随机化插入顺序开始,使用技巧在创建时查找大小适中的BSP.

K-d树是可以适用于此问题的另一种基于树的空间分区方案.但是,这将需要一些工作,因为k-d树的大多数表示方式都与搜索点有关,而不是代表区域.它们具有与BSP相同的最坏情况行为.

另一个坏消息是这些算法不适用于大于3的元组.树很快变得太大.搜索高维空间非常困难,并且是积极研究的主题.但是,由于您没有说任何有关元组长度的信息,因此我将在这里停止.

Consider the following list of tuples: [(5,4,5), (6,9,6), (3,8,3), (7,9,8)]

I am trying to devise an algorithm to check whether there exists at least one tuple in the list where all elements of that tuple are greater than or equal to a given tuple (the needle).

For example, for a given tuple (6,5,7), the algorithm should return True as every element in the given tuple is less than the last tuple in the list, i.e. (7,9,8). However, for a given tuple (9,1,9), the algorithm should return False as there is no tuple in the list where each element is greater than the given tuple. In particular, this is due to the second element 1 of the given tuple, which is smaller than the second element of all tuple in the list.

A naive algorithm would loop through the tuple in the list one by one, and loop through the the element of the tuple in the inner loop. Assuming there are n tuples, where each tuple have m elements, this will give a complexity of O(nm).

I am thinking whether it would be possible to have an algorithm to produce the task with a lower complexity. Pre-processing or any fancy data-structure to store the data is allowed!

My original thought was to make use of some variant of binary search, but I can't seem to find a data structure that allow us to not fall back to the naive solution once we have eliminated some tuples based on the first element, which implies that this algorithm could potentially be O(nm) at the end as well.

Thanks!

解决方案

Consider the 2-tuple version of this problem. Each tuple (x,y) corresponds to an axis-aligned rectangle on the plane with upper right corner at (x,y) and lower right at (-oo,+oo). The collection corresponds to the union of these rectangles. Given a query point (needle), we need only determine if it's in the union. Knowing the boundary is sufficient for this. It's an axis-aligned polyline that's monotonically non-increasing in y with respect to x: a "downward staircase" in the x direction. With any reasonable data structure (e.g. an x-sorted list of points on the polyline), it's simple to make the decision in O(log n) time for n rectangles. It's not hard to see how to construct the polyline in O(n log n) time by inserting rectangles one at a time, each with O(log n) work.

Here's a visualization. The four dots are input tuples. The area left and below the blue line corresponds to "True" return values:

Tuples A, B, C affect the boundary. Tuple D doesn't.

So the question is whether this 2-tuple version generalizes nicely to 3. The union of semi-infinite axis-aligned rectangles becomes a union of rectangular prisms instead. The boundary polyline becomes a 3d surface.

There exist a few common ways to represent problems like this. One is as an octree. Computing the union of octrees is a well-known standard algorithm and fairly efficient. Querying one for membership requires O(log k) time where k is the biggest integer coordinate range contained in it. This is likely to be the simplest option. But octrees can be relatively slow and take a lot of space if the integer domain is big.

Another candidate without these weaknesses is a Binary Space Partition, which can handle arbitrary dimensions. BSPs use (hyper)planes of dimension n-1 to recursively split n-d space. A tree describes the logical relationship of the planes. In this application, you'll need 3 planes per tuple. The intersection of the "True" half-spaces induced by by the planes will be the True semi-infinite prism corresponding to the tuple. Querying a needle is traversing the tree to determine if you're inside any of the prisms. Average case behavior of BSPs is very good, but worst case size of the tree is terrible: O(n) search time over a tree of size O(2^n). In real applications, tricks are used to find BSPs of modest size at creation time, starting with randomizing insertion order.

K-d trees are another tree-based space partitioning scheme that could be adapted to this problem. This will take some work, though, because most presentations of k-d trees are concerned with searching for points, not representing regions. They'd have the same worst case behavior as BSPs.

The other bad news is that these algorithms aren't well-suited to tuples much bigger than 3. Trees quickly become too big. Searching high dimensional spaces is hard and a topic of active research. However, since you didn't say anything about tuple length, I'll stop here.

这篇关于有效搜索所有元素都大于给定元组的元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆