快速算法可以迅速找到号码属于一个范围集合的范围? [英] Fast Algorithm to Quickly Find the Range a Number Belongs to in a Set of Ranges?

查看:126
本文介绍了快速算法可以迅速找到号码属于一个范围集合的范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个号码范围。那些范围不重叠 - 因为它们不是重叠的,合乎逻辑的结果是,没有数量可以是一个以上的范围,在任何时间的一部分。每个范围连续地(有一个单一的范围内没有孔,所以一个范围为8〜16将真正含有8和16之间的所有数值),但可以有两个范围之间的孔(例如,范围开始于64,然后进到128,接下来的范围开始于256并进入384),所以有些数字可能不属于在所有(编号129至255不属于任何范围在这个例子中的任何范围)。

I have several number ranges. Those ranges are not overlapping - as they are not overlapping, the logical consequence is that no number can be part of more than one range at any time. Each range is continuously (there are no holes within a single range, so a range 8 to 16 will really contain all numbers between 8 and 16), but there can be holes between two ranges (e.g. range starts at 64 and goes to 128, next range starts at 256 and goes to 384), so some numbers may not belong to any range at all (numbers 129 to 255 would not belong to any range in this example).

我收到一个号码,需要知道其范围的号码属于...如果它属于任何范围的。否则,我需要知道,它不属于任何范围。当然,速度是很重要的;我不能简单地检查所有这将是为O(n)的范围内,因为有可能是范围十万。

I'm getting a number and need to know to which range the number belongs to... if it belongs to any range at all. Otherwise I need to know that it does not belong to any range. Of course speed is important; I can not simply check all the ranges which would be O(n), as there might be thousands of ranges.

有一个简单的解决办法是保持所有数字在排序后的数组,并在其上​​运行二进制搜索。这会给我至少O(log n)的。当然,二进制搜索必须有所修改,因为它必须经常检查对最小和最大数的范围。如果该号码是寻找两者之间,我们已经找到了正确的范围,否则,我们必须寻找低于或高于当前的范围。如果仅存在一个范围留在端部和数量不是在该范围内,数没有范围内的所有这样我们可以返回未找到结果。

A simple solution was keeping all numbers in a sorted array and run a binary search on it. That would give me at least O(log n). Of course the binary search must be somewhat modified as it must always check against the smallest and biggest number of a range. If the number to look for is in between, we have found the correct range, otherwise we must search ranges below or above the current one. If there is only one range left in the end and the number is not within that range, the number is within no range at all and we can return a "not found" result.

范围也可以链接在一起某种树结构。这基本上就像一个排序的列表与二进制搜索。的优点是,它会更快修改树比排序后的数组(添加/删除范围内),但不同于我们浪费一些额外的时间上保持树平衡,树可能会非常不平衡随时间,这将导致慢得多搜索比对排序后的数组的二进制搜索。

Ranges could also be chained together in some kind of tree structure. This is basically like a sorted list with binary search. The advantage is that it will be faster to modify a tree than a sorted array (adding/removing range), but unlike we waste some extra time on keeping the tree balanced, the tree might get very unbalanced over the time and that will lead to much slower searches than a binary search on a sorted array.

人们可以认为这是溶液更好或更差的在实践中搜索与修改的操作数将是几乎平衡(会有一个相等数目的搜索和添加/移除每秒执行的操作)。

One can argue which solution is better or worse as in practice the number of searches and modification operations will be almost balanced (there will be an equal number of searches and add/remove operations performed per second).

有可能是更好的数据结构比排序列表或这类问题一棵树?也许有可能在最坏的情况下甚至比O(log n)的在最好的情况下和O(log n)的好?

Is there maybe a better data structure than a sorted list or a tree for this kind of problem? Maybe one that could be even better than O(log n) in best case and O(log n) in worst case?

一些额外的信息,这可能有助于在这里是这样的:所有的范围总是开始并结束在多个二的幂的。他们总是所有就在两个相同功率结束(例如,它们都开始/结束时的4的倍数,或在8的倍数或16的倍数等)。两个电源在运行时期间不能改变。在第一个范围添加,两个电源必须设置和所有的范围不断增加,必须开始/结束,在此值的倍数,直到应用程序终止。我认为这是可以用于优化,因为如果他们都开始在如的倍数8,我可以忽略前3位全部比较操作,仅在其他位会告诉我的范围内(如有)。

Some additional information that might help here is the following: All ranges always start and end at multiple of a power of two. They always all start and end at the same power of two (e.g. they all start/end at a multiple of 4 or at a multiple of 8 or at a multiple of 16 and so on). The power of two cannot change during run time. Before the first range is added, the power of two must be set and all ranges ever added must start/end at a multiple of this value till the application terminates. I think this can be used for optimization, as if they all start at a multiple of e.g. 8, I can ignore the first 3 bits for all comparison operations, the other bits alone will tell me the range if any.

我读到一节和范围的树木。这些是最​​佳的解决问题的办法?有没有可能是更好的解决方案?这个问题听起来类似于一个malloc实现必须做的(如每free'd内存块属于一个范围内可用的内存和malloc实现的,必须找出哪一个),那么,如何那些通常解决这一问题?

I read about section and ranges trees. Are these optimal solutions to the problem? Are there possibly better solutions? The problem sounds similar to what a malloc implementation must do (e.g. every free'd memory block belongs to a range of available memory and the malloc implementation must find out to which one), so how do those commonly solve the issue?

推荐答案

运行各种基准测试之后,我来到了,只有像结构树可以在这里工作的结论。排序列表显示了当然好查找性能 - O(log n)的 - 但它显示了可怕的更新性能(!插入和清除的速度慢一些比较树10倍以上)

After running various benchmarks, I came to the conclusion that only a tree like structure can work here. A sorted list shows of course good lookup performance - O(log n) - but it shows horribly update performance (inserts and removals are slower by more than the factor 10 compared to trees!).

一个平衡二叉树也有O(log n)的查找性能,但它是更快,更新,还围着O(log n)的,而排序列表更像是为O(n),更新(O(日志N),找到插入或删除元素的位置,但随后长达n个元素必须在列表中移动,这是O(n))。

A balanced binary tree also has O(log n) lookup performance, however it is much faster to update, also around O(log n), while a sorted list is more like O(n) for updates (O(log n) to find the position for insert or the element to delete, but then up to n elements must be moved within the list and this is O(n)).

我实现了一个AVL树,红黑树,树堆,一节AA树和B树的各种变体(B指拜耳树在这里,而不是二进制)。结果:拜耳的树木几乎从未取胜。他们查找是很好的,但他们的更新性能较差(如B树,你再有一个排序列表中的每个节点内!)。作为B树必须读/写小得多节点比其他任何 - 其中读/写一个节点是一个非常缓慢的操作(例如,当节点直接读取或写入从/向硬盘)拜耳树仅优越的情况下树,所以在这样的情况下,它会获胜。如果我们有在内存虽然树,它的立场与其他树木没有机会了,对不起,所有的B树的球迷在那里。

I implemented an AVL tree, a red-black tree, a Treap, an AA-Tree and various variations of B-Trees (B means Bayer Tree here, not Binary). Result: Bayer trees almost never win. Their lookup is good, but their update performance is bad (as within each node of a B-Tree you have a sorted list again!). Bayer trees are only superior in cases where reading/writing a node is a very slow operation (e.g. when the nodes are directly read or written from/to hard disk) - as a B-Tree must read/write much less nodes than any other tree, so in such a case it will win. If we are having the tree in memory though, it stands no chance against other trees, sorry for all the B-Tree fans out there.

一个树堆是最容易实现的($ C $的不到一半c您需要其他平衡树的线条,只有两次在code,你需要一个不平衡的树),并显示了查找和更新的良好表现平平...但我们可以做得更好。

A Treap was easiest to implement (less than half the lines of code you need for other balanced trees, only twice the code you need for an unbalanced tree) and shows good average performance for lookups and updates... but we can do better than that.

这是AA-树显示惊人良好的查找性能 - 我不知道为什么。他们有时会击败所有其他的树(不是远,但仍足以不重合)...和去除效果是好的,但是除非我太傻正确实施。其中,插入的性能实在是太差了(它执行更多的树旋转上的每个刀片比其他任何树 - 甚至是B-树有更快的插入性能)

An AA-Tree shows amazing good lookup performance - I have no idea why. They sometimes beat all other trees (not by far, but still enough to not be coincident)... and the removal performance is okay, however unless I'm too stupid to implement them correctly, the insert performance is really bad (it performs much more tree rotations on every insert than any other tree - even B-Trees have faster insert performance).

这给我们留下了二经,AVL和RB树。它们是类似,都pretty的,但标杆小时后,有一点是明确的:AVL树肯定有更好的查询性能比RB-树。所不同的是不是巨大的,但在2/3所有基准,他们将赢得查找测试。不要太奇怪,毕竟AVL树比RB-树更严格的平衡,所以他们更接近最优二叉树在大多数情况下。我们不是在谈论一个巨大的区别就在这里,它始终是一个紧密的赛程。

This leaves us with two classics, AVL and RB-Tree. They are both pretty similar but after hours of benchmarking, one thing is clear: AVL Trees definitely have better lookup performance than RB-Trees. The difference is not gigantic, but in 2/3 out of all benchmarks they will win the lookup test. Not too surprising, after all AVL Trees are more strictly balanced than RB-Trees, so they are closer to the optimal binary tree in most cases. We are not talking about a huge difference here, it is always a close race.

在另一方面RB树击败AVL树的插入在几乎所有的测试运行,这是不是这样紧密的赛程。如前,预期。作为不太严格的平衡RB树的刀片相比,AVL树上的要少得多树转。

On the other hand RB Trees beat AVL Trees for inserts in almost all test runs and that is not such a close race. As before, that is expected. Being less strictly balanced RB Trees perform much less tree rotations on inserts compared to AVL Trees.

如何去除节点?这里似乎上的节点数量取决于很多。对于小的节点号(一切都低于五十万)RB树再次自己的AVL树;所不同的是比刀片更大。而令人意想不到的是,一旦节点数量增长超过一万个节点AVL树似乎赶上和差价RB树收缩,直到它们都或多或少同样快速。这可能是该系统的效果,虽然。它可能有做的过程或CPU缓存或类似的内存使用情况。一些已经在RB树更负面的影响比对AVL树,因此AVL树能赶上。对于查找没有观察到同样的效果(AVL通常更快,不管有多少个节点)和插件(RB通常更快,不管有多少个节点)。

How about removal of nodes? Here it seems to depend a lot on the number of nodes. For small node numbers (everything less than half a million) RB Trees again own AVL Trees; the difference is even bigger than for inserts. Rather unexpected is that once the node number grows beyond a million nodes AVL Trees seems to catch up and the difference to RB Trees shrinks until they are more or less equally fast. This could be an effect of the system, though. It could have to do with memory usage of the process or CPU caching or the like. Something that has a more negative effect on RB Trees than it has on AVL Trees and thus AVL Trees can catch up. The same effect is not observed for lookups (AVL usually faster, regardless how many nodes) and inserts (RB usually faster, regardless how many nodes).

结论:
我想最快我可以使用时RB-树,因为查找的数量只会比插入和缺失和数量稍高无论AVL有多快上查询得到,整体表现将受到他们的糟糕插入/缺失的表现。

Conclusion:
I think the fastest I can get is when using RB-Trees, since the number of lookups will only be somewhat higher than the number of inserts and deletions and no matter how fast AVL is on lookups, the overall performance will suffer from their worse insert/deletion performance.

也就是说,除非有人在这里可能会想出将拥有RB树大的时候一个更好的数据结构; - )

That is, unless anyone here may come up with a much better data structure that will own RB Trees big time ;-)

这篇关于快速算法可以迅速找到号码属于一个范围集合的范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆