二进制分区n个排序的数据指向`O(b.log(n))`中的b个桶 [英] Binary Partitioning n sorted data points in to b Buckets in `O(b.log(n))`

查看:59
本文介绍了二进制分区n个排序的数据指向`O(b.log(n))`中的b个桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出大小为n的经过排序的可随机访问的输入数据(排序的数组),我想使用参数化的分区函数将其划分为存储桶.我希望将结果作为指向铲斗边缘的索引数组返回.

Given sorted randomly accessible input data (a sorted array) of size n, I'd like to partition this in to buckets using a parameterised partitioning function. I want the result back as an array of indexes to the bucket edges.

分区功能返回一个布尔值,以指示值是否也应位于同一分区中.

The partitioning function returns a bool to indicate if too values should be in the same partition or not.

请注意,在进行分区之前,我们不知道需要多少个存储桶:

Note that prior to partitioning, we don't know how many buckets will be required:

  • 可能所有输入都应放在单个存储桶中.
  • 可能是每个输入都需要自己的存储桶的情况.
  • 也许第一个值需要一个存储桶,而数组的其余部分在第二个存储桶中.

...这在运行分区算法之前是未知的.

… this is not know before running the partitioning algorithm.

作为一个具体示例,可以说我们具有分区功能:

As a concrete example, lets say we have the partitioning function:

sameBucket(a, b) = (a/10 == b/10)

其中/是整数除法(四舍五入).所以

Where / is integer division (it rounds down). So

sameBucket(0,1) == yes
sameBucket(1,2) == yes
sameBucket(0,9) == yes
sameBucket(0,10) == no

分区功能告诉我们 0 10 不应放置在同一存储桶中.

The partition function tells us 0 and 10 should not be placed in to the same bucket.

为清楚起见,请考虑使用下面显示其索引的输入数组(我假设有一个名为"end the end"的索引,称为end):

Consider this input array shown with its indexes underneath for clarity (I assume a "past the end" index called end):

[1, 3, 7, 14, 90, 91, 92, 93, 95, 99]
 0  1  2  3   4   5   6   7   8   9   end

对于此数据,用 ^ 表示属于新存储桶的元素:

For this data, the elements that are members of a new bucket are indicated with a ^:

[1, 3, 7, 14, 90, 91, 92, 93, 95, 99]
 0  1  2  3   4   5   6   7   8   9   end
 ^        ^   ^                       ^

如果我使用上面的分区功能,我将只获得开始一个新存储桶的索引:

If I use the partition function above, I'd get back just the indexes that begin a new bucket:

[0, 3, 4, end]

结果数组中的每个索引代表存储桶函数表示的第一个元素与其之前的元素不在同一个存储桶中.

Each index in the result array represents the first element that the bucket function says is not in the same bucket as the element before it.

  • 整个数据由范围 0…< end 表示.
  • 0…< 3 的范围是数字 [1、3、7] ,除以它们均是 0 10 .
  • 范围 3…< 4 是单个数字 14 ,当除以 10 1 >.
  • 范围 4…< end 是数字 [90、91、92、93、95、99] ,它们都是 0 除以 10 .
  • The whole data is represented by the range 0…<end.
  • The range 0…<3 are the numbers [1, 3, 7], which are all 0, when divided by 10.
  • The range 3…<4 is the single number 14, which is 1 when divided by 10.
  • The range 4…<end are the numbers [90, 91, 92, 93, 95, 99], which are all 0 when divided by 10.

我相信,经过修改的二进制搜索应该能够有效地执行此分区.对于 n 输入值和 b 输出存储桶,运行时间应为最差的 O(b.log(n)).有没有人有一种算法,甚至只是名称,我都可以查找吗?

I believe a modified binary search should be able to perform this partitioning efficiently. For n input values, and b output buckets, the run time should be at worst O(b.log(n)). Does any one have an algorithm for it, even just the name, so I can look it up?

推荐答案

要有效解决此问题,需要假设一个假设,即根据 sameBucket(左,右),已排序范围两端的元素属于同一存储桶,那么 left right 之间的所有值也必须属于同一存储桶.

Solving this problem efficiently requires an assumption that if elements at two ends of a sorted range belong to the same bucket according to sameBucket(left, right), then all values between left and right must belong to the same bucket as well.

我相信,经过修改的二进制搜索应该能够有效地执行此分区

I believe a modified binary search should be able to perform this partitioning efficiently

是的,您可以运行二进制搜索,如下所示:

Yes, you can run a binary search, like this:

  1. nextBucket 设置为零
  2. left 设置为 nextBucket ,并将 right 设置为输入数组的末尾
  3. mid 设置为 left right
  4. 之间的中点
  5. 如果 sameBucket(nextBucket,mid) true ,则将 left 移到 mid ;否则,将 right 移至 mid
  6. 如果 left == right ,则退出循环;否则,请返回步骤3
  7. 循环完成后, left 是下一个分区索引. nextBucket left 之间的所有项目都在同一个存储桶中.
  8. nextBucket 设置为 left + 1
  9. 如果 nextBucket 等于 n ,那么您完成了;否则,请返回步骤2.
  1. set nextBucket to zero
  2. set left to nextBucket and right to the end of the input array
  3. set mid to midpoint between left and right
  4. if sameBucket(nextBucket, mid) is true, move left to mid; otherwise, move right to mid
  5. if left == right, exit the loop; otherwise, go back to step 3
  6. Once the loop is done, left is the next partition index. All items between nextBucket and left are in the same bucket.
  7. Set nextBucket to left+1
  8. If nextBucket is equal to n, you are done; otherwise, go back to step 2.

我不认为此算法有一个特殊的名称-这是伪装得不好的二进制搜索.

I don't think this algorithm has a special name - it is a poorly disguised binary search.

这篇关于二进制分区n个排序的数据指向`O(b.log(n))`中的b个桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆