排序已知的统计分布数据的算法? [英] Sorting algorithms for data of known statistical distribution?

查看:214
本文介绍了排序已知的统计分布数据的算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这只是发生在我,如果你知道的数据进行排序的一些有关分布(在统计意义上的),一个排序算法的性能,如果你把这些信息纳入考虑可能会受益。

It just occurred to me, if you know something about the distribution (in the statistical sense) of the data to sort, the performance of a sorting algorithm might benefit if you take that information into account.

所以我的问题是,是否有任何的排序算法考虑到这方面的信息?如何好,他们是谁?

So my question is, are there any sorting algorithms that take into account that kind of information? How good are they?

编辑:一个例子来阐明:如果你知道你的数据的分布是高斯,你可以为你处理数据估算均值和平均的飞行。这会给你每一个号码,你可以用它来关闭它们放置到最终位置的最终位置的估计。

Edit : an example to clarify: if you know the distribution of your data to be Gaussian, you could estimate mean and average on the fly as you process the data. This would give you an estimate of the final position of each number, which you could use to place them close to their final position.

编辑#2:我是pretty的惊讶,答案是不是维基链接到一个thourough页面讨论这个问题。这不是一个很常见的情况(高斯情况下,例如)?

Edit #2: I'm pretty surprised the answer isn't a wiki link to a thourough page discussing this issue. Isn't this a very common case (the Gaussian case, for example)?

编辑#3:我加入悬赏这个问题,因为我在寻找明确的答案与资源,而不是投机。像中高斯分布的数据的情况下,XYZ算法是最快的平均,作为证明由Smith等人[1]。但是任何附加信息是值得欢迎的。

Edit #3: I'm adding a bounty to this question, because I'm looking for definite answers with sources, not speculation. Something like "in the case of gaussian distributed data, XYZ algorithm is the fastest on average, as was proved by Smith et al. [1]". However any additional information is welcome.

注意:我将颁发奖金,以最高投票的答案。投票明智!

Note: I will award the bounty to the highest-voted answer. Vote wisely!

推荐答案

如果要排序的数据有一个已知的分布,我会用的桶排序 算法。你可以添加一些额外的逻辑,这样你计算根据分配的属性(大小和/或各种桶的位置例如:高斯,你可能有一个水桶每次(SIGMA / K)的距离均值,其中西格玛是分布的标准偏差)。

If the data you are sorting has a known distribution, I would use a Bucket Sort algorithm. You could add some extra logic to it so that you calculated the size and/or positions of the various buckets based upon properties of the distribution (ex: for Gaussian, you might have a bucket every (sigma/k) away from the mean, where sigma is the standard deviation of the distribution).

通过具有已知的分布和修改标准桶排序算法,通过这种方式,你可能会得到在直方图排序 的算法或接近它。当然,你的算法是计算比直方图排序算法快,因为有可能不会需要做的第一通(中的链接描述),因为你已经知道的分配。

By having a known distribution and modifying the standard Bucket Sort algorithm in this way, you would probably get the Histogram Sort algorithm or something close to it. Of course, your algorithm would be computationally faster than the the Histogram Sort algorithm because there would probably not be a need to do the first pass (described in the link) since you already know the distribution.

编辑:您的问题给您的新标准,(虽然我的关于直方图排序联系的尊敬NIST和previous的答案中包含的性​​能信息),这里是从同行评审期刊文章在国际会议上并行处理:

given your new criteria of your question, (though my previous answer concerning Histogram Sort links to the respectable NIST and contains performance information), here is a peer review journal article from the International Conference on Parallel Processing:

自适应数据分区进行排序使用概率分布

作者声称该算法具有更好的性能(较好高达30%),比流行快速排序算法。

The authors claim this algorithm has better performance (up to 30% better) than the popular Quick-Sort Algorithm.

这篇关于排序已知的统计分布数据的算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆