N路合并算法 [英] Algorithm for N-way merge

查看:33
本文介绍了N路合并算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

2 路归并作为归并排序算法的一部分被广泛研究.但我有兴趣找出执行 N 路合并的最佳方式?

A 2-way merge is widely studied as a part of Mergesort algorithm. But I am interested to find out the best way one can perform an N-way merge?

比方说,我有 N 个文件,每个文件都对 100 万个整数进行了排序.我必须将它们合并到 1 个包含 1 亿个排序整数的文件中.

Lets say, I have N files which have sorted 1 million integers each. I have to merge them into 1 single file which will have those 100 million sorted integers.

请记住,此问题的用例实际上是基于磁盘的外部排序.因此,在实际场景中也会存在内存限制.因此,一次(99 次)合并 2 个文件的幼稚方法是行不通的.假设我们只有一个小的内存滑动窗口可用于每个数组.

Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.

我不确定是否已经有针对此 N 路合并的标准化解决方案.(谷歌搜索并没有告诉我太多).

I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).

但是如果你知道一个好的n-way合并算法,请发布algo/link.

But if you know if a good n-way merge algorithm, please post algo/link.

时间复杂度:如果我们大大增加要合并的文件数量(N),这将如何影响算法的时间复杂度?

Time complexity: If we greatly increase the number of files (N) to be merged, how would that affect the time complexity of your algorithm?

感谢您的回答.

我在任何地方都没有被问到这个问题,但我觉得这可能是一个有趣的面试问题.因此标记.

推荐答案

下面的想法怎么样:

  1. 创建优先队列

  2. 遍历每个文件f
  1. 使用第一个值作为优先键对 (nextNumberIn(f​​), f) 入队

  • 虽然队列不为空

  • While queue not empty

    1. 出队头(m, f)
    2. 输出m
    3. 如果 f 没有耗尽
    1. 入队(nextNumberIn(f​​), f)

  • 由于向优先队列添加元素可以在对数时间内完成,因此第 2 项是 O(N × log N).由于(几乎所有)while 循环迭代添加了一个元素,整个while 循环是O(M × log N) 其中M 是数字的总数排序.

    Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.

    假设所有文件都有一个非空的数字序列,我们有 M > N,因此整个算法应该是 O(M × log N).

    Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).

    这篇关于N路合并算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆