选择随机子集的通用算法实现 [英] Generic algorithm implementation to select a random subset

查看:74
本文介绍了选择随机子集的通用算法实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们要从大小 n 的总集合中选择一个大小为 m 的随机子集.由于可以使用 S = {0, 1, 2, ..., (n - 1)} 中的唯一索引来标识总集中的每个元素.该问题相当于从 S 中随机选择 m 个不同的元素.

Suppose we are to select a random subset of size m from a total set of size n. Since each element in the total set can be identified using a unique index from S = {0, 1, 2, ..., (n - 1)}. The problem is equivalent to randomly select m distinct elements from S.

一个简单的算法是重复调用伪随机数生成器 randS 生成随机数.如果之前已经生成过数字,请再试一次.算法终止直到生成 m 个不同的数字.该算法的最优空间复杂度为O(1),但可能调用rand超过m次.

A trivial algorithm would be repetitively invoking a pseudo-random number generator rand to generate random numbers from S. If a number has been generated before, just try again. The algorithm terminates until m distinct numbers are generated. This algorithm has an optimal space complexity of O(1), but may invoke rand more than m times.

我更关心时间复杂度而不是空间复杂度,如果合理,我会很乐意用空间换时间.所以我实现了以下算法.它调用 rand 正好 min{m, (n - m)} 次,但代价是增加了 O(n).(可以在此处找到原始代码)

I'm more concerning about the time complexity instead of space complexity, and would happily trade space for time if reasonable. So I implemented the following algorithm. It invokes rand exactly min{m, (n - m)} times, but at the price of an increased space complexity of O(n). (original code can be found here)

template <typename Clock = std::chrono::high_resolution_clock>
auto tick_count() {
  return Clock::now().time_since_epoch().count();
}

template <typename OutIt, typename RAND = std::minstd_rand,
          typename Uint = typename RAND::result_type>
void random_subset(std::size_t m, std::size_t n, OutIt it, RAND&& rand =
                   RAND(static_cast<Uint>(tick_count()))) {
  assert(n - 1 <= rand.max());
  assert(m <= n);
  if (m == 0) return;
  auto swapped = false;
  auto tmp = n - m;
  if (tmp < m) {
    m = tmp;
    swapped = true;
  }
  std::vector<std::size_t> indices(n);
  std::iota(indices.begin(), indices.end(), static_cast<std::size_t>(0));
  auto back_it = indices.end();
  for (std::size_t i = 0; i < m; ++i) {
    auto idx = rand() % (n - i);
    std::swap(indices[idx], *--back_it);
  }
  swapped ? std::copy(indices.begin(), back_it, it) :
            std::copy(back_it, indices.end(), it);
}

我想知道算法是否可以在性能方面进一步改进.也欢迎对通用实现进行改进.

I'm wondering whether the algorithm can be further improved in terms of performance. Improvements to the generic implementation are also welcome.

推荐答案

也许你可以使用 Fisher-Yates 算法,用于随机洗牌,特别是 德斯滕菲尔德版本的第二个变种:

Perhaps you could use a very minor variant of the Fisher-Yates algorithm for random shuffling, specifically the second variant of the Durstendfeld version:

-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−2 do
     j ← random integer such that 0 ≤ j < n-i
     exchange a[i] and a[i+j]

只需将循环终止从 n - 2 更改为您需要的.

Just change the loop termination from n - 2 to what you need.

在证明中,循环不变式是一旦一个索引 i 被传递,直到它的数组是一个随机洗牌.因此,您可能会提前终止并获得所需的结果.

In the proof, the loop invariant is that once an index i has been passed, the array up to it is a random shuffle. Consequently, you may terminate early with your required result.

这篇关于选择随机子集的通用算法实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆