我可以有效地从HashSet中随机采样吗? [英] Can I randomly sample from a HashSet efficiently?

查看:155
本文介绍了我可以有效地从HashSet中随机采样吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个std::collections::HashSet,我想采样并删除一个均匀随机的元素.

I have a std::collections::HashSet, and I want to sample and remove a uniformly random element.

当前,我正在做的是使用rand.gen_range随机采样索引,然后遍历HashSet到该索引以获取元素.然后,我删除选定的元素.这可行,但是效率不高.有没有一种有效的方法可以对元素进行随机采样?

Currently, what I'm doing is randomly sampling an index using rand.gen_range, then iterating over the HashSet to that index to get the element. Then I remove the selected element. This works, but it's not efficient. Is there an efficient way to do randomly sample an element?

这是我的代码的简化版本:

Here's a stripped down version of what my code looks like:

use std::collections::HashSet;

extern crate rand;
use rand::thread_rng;
use rand::Rng;

let mut hash_set = HashSet::new();

// ... Fill up hash_set ...

let index = thread_rng().gen_range(0, hash_set.len());
let element = hash_set.iter().nth(index).unwrap().clone();
hash_set.remove(&element);

// ... Use element ...

推荐答案

唯一允许在恒定时间内进行统一采样的数据结构是具有恒定时间索引访问权限的数据结构. HashSet不提供索引,因此您无法在恒定时间内生成随机样本.

The only data structures allowing uniform sampling in constant time are data structures with constant time index access. HashSet does not provide indexing, so you can't generate random samples in constant time.

我建议先将您的哈希集转换为Vec,然后再从向量中采样.要删除元素,只需将最后一个元素移到其位置–无论如何,矢量中元素的顺序无关紧要.

I suggest to convert your hash set to a Vec first, and then sample from the vector. To remove an element, simply move the last element in its place – the order of the elements in the vector is immaterial anyway.

如果要以随机顺序使用集合中的所有元素,还可以将向量随机洗一次,然后对其进行迭代.

If you want to consume all elements from the set in random order, you can also shuffle the vector once and then iterate over it.

以下是在恒定时间内从Vec中删除随机元素的示例实现:

Here is an example implementation for removing a random element from a Vec in constant time:

use rand::{thread_rng, Rng};

pub trait RemoveRandom {
    type Item;

    fn remove_random<R: Rng>(&mut self, rng: &mut R) -> Option<Self::Item>;
}

impl<T> RemoveRandom for Vec<T> {
    type Item = T;

    fn remove_random<R: Rng>(&mut self, rng: &mut R) -> Option<Self::Item> {
        if self.len() == 0 {
            None
        } else {
            let index = rng.gen_range(0, self.len());
            Some(self.swap_remove(index))
        }
    }
}

(游乐场 )

这篇关于我可以有效地从HashSet中随机采样吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆