排序算法,其中两两对比可以比-1返回更多信息,0,+ 1 [英] sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

查看:215
本文介绍了排序算法,其中两两对比可以比-1返回更多信息,0,+ 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大多数排序算法依赖于两两对比的判断A< B,A = B或A> B。

Most sort algorithms rely on a pairwise-comparison the determines whether A < B, A = B or A > B.

我在寻找的算法(和奖励积分,code在Python),占据了两两对比功能,可从多一点区别就少了很多,从少或有更多的优势。因此,或许不是返回{-1,0,1}的比较函数返回{-2,-1,0,1,2}或{-5,-4,-3,-2,-1,0,1 ,2,3,4,5},甚至是真实的间隔数(-1,1)。

I'm looking for algorithms (and for bonus points, code in Python) that take advantage of a pairwise-comparison function that can distinguish a lot less from a little less or a lot more from a little more. So perhaps instead of returning {-1, 0, 1} the comparison function returns {-2, -1, 0, 1, 2} or {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} or even a real number on the interval (-1, 1).

有关某些应用(如靠近排序或近似排序)这将使一个合理排序也可以用较少的比较来确定。

For some applications (such as near sorting or approximate sorting) this would enable a reasonable sort to be determined with less comparisons.

推荐答案

的额外的信息确实可以用于最小化比较的总数。调用super_comparison函数可以被用来制造扣除相当于调用一定期comparsion功能的一个很大的数字。例如,更少,低于B ç有点少,低于B 意味着 A&LT; C&LT; b

The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, a much-less-than b and c little-less-than b implies a < c < b.

在扣除罐被组织成箱或分区而每个都可以单独排序。有效地,这等同于快速排序与n路分区。下面是用Python实现:

The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:

from collections import defaultdict
from random import choice

def quicksort(seq, compare):
    'Stable in-place sort using a 3-or-more-way comparison function'
    # Make an n-way partition on a random pivot value
    segments = defaultdict(list)
    pivot = choice(seq)
    for x in seq:
        ranking = 0 if x is pivot else compare(x, pivot)
        segments[ranking].append(x)
    seq.clear()

    # Recursively sort each segment and store it in the sequence
    for ranking, segment in sorted(segments.items()):
        if ranking and len(segment) > 1:
            quicksort(segment, compare)
        seq += segment

if __name__ == '__main__':
    from random import randrange
    from math import log10

    def super_compare(a, b):
        'Compare with extra logarithmic near/far information'
        c = -1 if a < b else 1 if a > b else 0
        return c * (int(log10(max(abs(a - b), 1.0))) + 1)

    n = 10000
    data = [randrange(4*n) for i in range(n)]
    goal = sorted(data)
    quicksort(data, super_compare)
    print(data == goal)

通过插装这个code用的跟踪的模块,它可以测量的性能提升。另外,在上述code,常规三通比较使用133000比较而超级比较功能降低的呼叫的数目85000

By instrumenting this code with the trace module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.

在code也可以很容易地尝试各种比较函数。这将显示天真n路比较函数能做的非常有限,以帮助排序。例如,如果比较函数返回+/- 2为用于区别四个或更少的差异大于四和+/- 1,只有在比较次数适度减少5%。根本原因是,仅在开始使用的粗粒度分区有极少数近场和一切落在远场。

The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".

这是改进了超比较,以涵盖对数范围(即+/- 1,如果在十+/- 2如果在百+/-如果在一千年。

An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.

这是理想的比较函数是自适应的。对于任何给定序列的大小,比较函数应努力细分序列分成大致相等大小的分区。信息理论告诉我们,这将最大化的每比较信息的比特数。

An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.

的自适应方法是一个很好的直觉,以及。人们应该首先被划分成的的VS的喜欢的制作更精致的区分,如爱一个,很多VS爱情-A-有点过了。进一步划分通行证建议每个人都越来越精细的区分。

The adaptive approach makes good intuitive sense as well. People should first be partitioned into love vs like before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.

这篇关于排序算法,其中两两对比可以比-1返回更多信息,0,+ 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆