C ++随机采样范围为0:n-1(n> k)的k个数字而不进行替换 [英] C++ randomly sample k numbers from range 0:n-1 (n > k) without replacement

查看:96
本文介绍了C ++随机采样范围为0:n-1(n> k)的k个数字而不进行替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将MATLAB仿真移植到C ++中.为此,我尝试复制MATLAB的 randsample()函数.我还没有找到一种有效的方法来做到这一点.

I'm working on porting a MATLAB simulation into C++. To do this, I am trying to replicate MATLAB's randsample() function. I haven't figured out an efficient way to do this yet.

所以我问大家,如何最好地从0:n-1(对于n> k)范围内随机抽取k个数字,而不用C ++进行替换?

So I ask you all, how do I best randomly sample k numbers from a range 0:n-1 (for n > k) without replacement in C++?

我考虑了以下伪代码(受 cppreference.com ),但我觉得它有点笨拙:

I've considered the following pseudocode (inspired by the third example on cppreference.com), but I feel like it's a bit hacky:

initialize vect<int> v of size n
for i = 0 to n-1
    v[i] = i
shuffle v
return v[0 to k-1]

这里的缺点是也需要先构建一个大型数组.这似乎是缓慢/笨拙的过度杀伤力.

The drawback here is also the requirement to build a massive array first too. That seems like slow/clunky overkill.

如果您能提供帮助,我希望在这里提供一些指导.我对理论不感兴趣(算法很有趣,但现在与我的需求无关),而不是在C ++中实现该理论的最佳方法.

I would love some direction here if you can help. I'm less interested in the theory (algorithms are interesting but not relevant to my needs now) than the best way to implement this in C++.

提前谢谢!

推荐答案

如果N很大但k不是:

std::vector<int> pick(int N, int k) {
    std::random_device rd;
    std::mt19937 gen(rd());

    std::unordered_set<int> elems = pickSet(N, k, gen);

    // ok, now we have a set of k elements. but now
    // it's in a [unknown] deterministic order.
    // so we have to shuffle it:

    std::vector<int> result(elems.begin(), elems.end());
    std::shuffle(result.begin(), result.end(), gen);
    return result;
}

现在实现pickSet的天真的方法是:

Now the naive approach of implementing pickSet is:

std::unordered_set<int> pickSet(int N, int k, std::mt19937& gen)
{
    std::uniform_int_distribution<> dis(1, N);
    std::unordered_set<int> elems;

    while (elems.size() < k) {
        elems.insert(dis(gen));
    }

    return elems;
}

但是,如果k相对于N较大,则此算法可能导致大量冲突,并且运行速度可能很慢.我们可以保证在每个插入项上添加一个元素,从而做得更好(由

But if k is large relative to N, this algorithm could lead to lots of collisions and could be pretty slow. We can do better by guaranteeing that we can add one element on each insertion (brought to you by Robert Floyd):

std::unordered_set<int> pickSet(int N, int k, std::mt19937& gen)
{
    std::unordered_set<int> elems;
    for (int r = N - k; r < N; ++r) {
        int v = std::uniform_int_distribution<>(1, r)(gen);

        // there are two cases.
        // v is not in candidates ==> add it
        // v is in candidates ==> well, r is definitely not, because
        // this is the first iteration in the loop that we could've
        // picked something that big.

        if (!elems.insert(v).second) {
            elems.insert(r);
        }   
    }
    return elems;
}

这篇关于C ++随机采样范围为0:n-1(n> k)的k个数字而不进行替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆