从 n 个排序数组中找到第 k 个最小的数字 [英] Finding kth smallest number from n sorted arrays

查看:42
本文介绍了从 n 个排序数组中找到第 k 个最小的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,你有 n 个排序数组(不一定是等长的),你要返回组合数组中第 k 个最小的元素(即通过合并所有 n 个排序数组形成的组合数组)

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)

我已经尝试了它和它的其他变体很长一段时间了,直到现在我只在有两个相等长度的数组的情况下感到舒服,两个数组都已排序并且一个必须返回这两个的中位数.这具有对数时间复杂度.

I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two. This has logarithmic time complexity.

在此之后,我尝试将其概括为在两个排序数组中找到第 k 个最小的数组.这里 是关于 SO 的问题.即使在这里给出的解决方案对我来说也不是很明显.但即使我以某种方式说服自己接受这个解决方案,我仍然很好奇如何解决绝对一般情况(这是我的问题)

After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO. Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)

有人可以向我解释一步一步的解决方案吗(在我看来,这又应该是对数时间,即 O( log(n1) + log(n2)... + log(nN) 其中 n1, n2...nN 是n 个数组的长度),从更具体的情况开始,然后转向更一般的情况?

Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?

我知道互联网上到处都是针对更具体案例的类似问题,但我还没有找到令人信服的明确答案.

I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.

这里 是一个关于 SO 的问题(及其答案)的链接处理 5 个排序数组并找到组合数组的中位数.答案太复杂了,我无法概括.

Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.

欢迎针对更具体的情况(正如我在帖子中提到的)采用简洁的方法.

Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.

PS:您认为这可以进一步推广到未排序数组的情况吗?

PS: Do you think this can be further generalized to the case of unsorted arrays?

PPS:这不是作业问题,我只是在准备面试.

PPS: It's not a homework problem, I am just preparing for interviews.

推荐答案

这并没有概括链接,但确实解决了问题:

This doesn't generalize the links, but does solve the problem:

  1. 遍历所有数组,如果有长度 > k,则截断为长度 k(这很愚蠢,但我们稍后会弄乱 k,所以无论如何都要这样做)
  2. 确定剩余的最大数组 A.如果多于一个,则选择一个.
  3. 选取最大数组 A 的中间元素 M.
  4. 对剩余数组使用二分搜索来找到相同的元素(或最大元素 <= M).
  5. 根据各个元素的索引,计算元素总数 <= M 和 > M.这应该给你两个数字:L,数字 <= M 和 G,数字 > M
  6. 如果 k
  7. 如果 k > L,则在您找到的分割点处截断所有数组并迭代较小的数组(使用上半部分,并搜索元素 (k-L)).

当你到达每个数组只有一个元素(或 0)的地步时,用这些数据创建一个大小为 n 的新数组,排序并选择第 k 个元素.

When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.

因为您总是可以保证至少删除一个数组的一半,所以在 N 次迭代中,您将删除一半的元素.这意味着有 N log k 次迭代.每次迭代的顺序是 N log k(由于二分搜索),所以整个过程是 N^2 (log k)^2 当然,仅此而已,最坏的情况,基于这样的假设,即您只摆脱了最大数组的一半,而不是其他数组.在实践中,我想典型的性能会比最坏的情况好很多.

Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.

这篇关于从 n 个排序数组中找到第 k 个最小的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆