有人可以解释一下桶分类的实现原理吗? [英] Can someone please explain how this implementation of bucket sort works?

查看:98
本文介绍了有人可以解释一下桶分类的实现原理吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解存储桶排序的基本概念,希望有人能向我明确说明排序算法的作用以及它如何在O(N)中完成所需的结果(对内部容器进行排序) ) 时间.而且,由于这似乎很快,其他排序算法(例如冒泡,插入或选择)还有什么优势,可以说服人们在存储桶排序中使用它们?

I'm having a hard time understanding the underlying concepts of bucket sort, and I was hoping someone could clarify to me exactly what the sorting algorithm does and how it accomplishes the desired result (sorting an internal container) in O(N) time. Also, as this seems to be quite fast, what advantages do other sorting algorithms (like bubble, insertion, or selection) have that would persuade one to use them over bucket sort?

这是我在网上找到的算法的实现.如果有人可以在解释中引用此信息,我将不胜感激.

Here is an implementation of the algorithm I found online. If someone could reference this in their explanation I would greatly appreciate it.

void binsort(std::vector<std::size_t>& A){
    std::vector<std::vector<std::size_t>> B(MAX + 1);
    for(std::size_t i = 0; i < SIZE; ++i){
        B[A[i]].push_back(A[i]);
    }
    std::size_t current = 0;
    for(std::size_t i = 0; i < MAX; ++i){
        for(auto item : B[i]){
            A[current++] = item;
        }
    }
}

感谢您的帮助.

推荐答案

我将尝试解释这个特定的实现,这是您可能会看到的最简单的实现之一.巧合的是,它在输入域中也对​​允许的数字也有严格的限制(用值MAX表示).

I'll try and explain this specific implementation, which is one of the simplest you're likely to see. It not-coincidentally also has a hard restriction of allowable numbers in input domain as well (represented by the value MAX).

假设我们有10个数字的集合.他们共享的一个属性是,它们都在[0..5]域中

Suppose we have a collection of 10 numbers. the one attribute they share is that they are all in the domain [0..5]

{ 3, 2, 2, 5, 1, 4, 0, 2, 5, 4 }

现在,我们创建一个存储桶列表",其中每个存储桶代表来自域的值的集合; 不是输入数组.我们的域允许使用6个可能的值,因此我们创建了6个存储桶(如果您不注意,则按顺序"排列):

Now, we create a "list" of buckets, where each bucket is represents a collection of values from the domain; not the input array. Our domain allows for 6 possible values, so we create six buckets (which are in "order", in case you don't notice that):

 0: {}
 1: {}
 2: {}
 3: {}
 4: {}
 5: {}

现在遍历输入列表,将每个值放入存储桶中.从概念上讲,完成后看起来像这样:

Now walk the input list, dropping each value in it's bucket. Conceptually it looks like this when finished:

 0: {0}
 1: {1}
 2: {2,2,2}
 3: {3}
 4: {4,4}
 5: {5,5}

现在,只需遍历我们的存储桶列表,然后将每个存储桶中的内容倒回原始容器中,替换其中的任何物品即可.

Now, just walk our list of buckets, and dump the contents in each back into the original container, replacing whatever item was there.

{ 0, 1, 2, 2, 2, 3, 4, 4, 5, 5}

看起来很简单,是吗?那么,为什么每个人都不尽其所能呢?好吧,考虑我们扩大了问题.代替由6个可能值组成的MAX,我们使值"以1048576为界(如果您想知道,则为2 20 ),但将排序的项目数保持为只需10 .

Seems simple, yes? So why isn't everyone doing this for all their sorts? Well, consider we expand the problem. Instead of a MAX of 6 possible values, we make the 'value' bounded by 1048576 (220 in case you were wondering), but keep the number of items sorted to just 10.

现在,给出以下列表:

{ 3, 2, 2, 1048576, 1, 4, 0, 2, 5, 4 }

我们的存储桶"列表如下所示:

Our "bucket" list looks like this:

 0: {0}
 1: {1}
 2: {2,2,2}
 3: {3}
 4: {4,4}
 5: {5}
 6: {}
 7: {}
 .....
 1048575: {}
 1048576: {1048576}

是的,超过一百万个水桶可以对十个数字进行排序,所有这些都是因为这是我们问题域中允许的最大值.显然,这对于大的MAX天花板是不可行的.将输入范围细分为可管理的集合将是一个可行的解决方案(实际上,本质上是基数排序的工作方式.

Yeah, over a million buckets to sort ten numbers, all because that is the allowable max in our problem domain. Obviously this would not be feasible for large MAX ceilings. Sub-splitting the input range into manageable sets would be a viable solution to this (and in fact, is essentially how a radix sort works.

要回答最后一组问题,显然,如果您的输入域很小,那么您就很难在排序速度上胜过这个.例如,如果我们有上千个数字,而所有所有都在[0..9]中,那么这将是快速的咆哮.再加上几个数量级,就根本无法进行比较.但是,您支付的价格 价格是受限制的输入域.随着域大小的增加,您必须从存储桶拆分算法的角度来看待它,并且这样做的话,您会从通往O(NlogN)的路径开始.鉴于此,有很多算法(堆排序,合并排序,快速排序等),带有值得考虑的警告事项.

To answer your final set of questions, obviously if you had a reasonably small input domain, you would be hard to beat this for sort speed. For example, if we had a set of a thousand numbers, all of which were in [0..9], this would be roaring-quick. Add a few orders of magnitude to that and it would be no comparison at all. However, the price, the heavy price, that you pay, is a restricted input domain. As the domain size rises, you have to approach it from a bucket-splitting algorithm perspective, and as you do so, you start down the path toward O(NlogN). Given that, there are plenty of algorithms (heap-sort, merge-sort, quick-sort, etc..) with their own set of caveats worth considering.

一个明显的胜利"的地方:假设您必须对一百万个8位字符(根据定义在[0..255]中的值)进行排序,那么您将找不到更快的算法来做到这一点.该域定义明确,易于管理,如果使用了适当的存储桶"表(从字面上看是计数器表),我看不到它被击败了.

A place where it would be an obvious "win": Suppose you have to sort a million 8-bit characters (by definition a value in [0..255] ), you will not find a faster algorithm to do it. The domain is well-defined, very manageable, and if a proper table of "buckets" were utilized (literally a table of counters), I can't see it being beat.

这篇关于有人可以解释一下桶分类的实现原理吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆