从集合中选择随机元素,比线性时间快(Haskell) [英] Select random element from a set, faster than linear time (Haskell)

查看:108
本文介绍了从集合中选择随机元素,比线性时间快(Haskell)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建这个函数,它从 Set

  randElem ::(RandomGen g) =>设置 - > g  - > (a,g)

可以编写简单的listy实现。例如(代码更新,验证工作):

  import Data.Set as Set 
import System.Random(getStdGen ,randomR,RandomGen)

randElem ::(RandomGen g)=>设置 - > g - > (a,g)
randElem sg =(Set.toList s !! n,g')
其中(n,g')= randomR(0,Set.size s - 1)g

- 简单试驾
main =做g < - getStdGen
打印。 fst $ randElem sg
其中s = Set.fromList [1,3,5,7,9]

但对于大型(随机选择的) n ,使用 !! 会产生线性查找代价。有没有更快的方法来选择一个集合中的随机元素?理想情况下,重复的随机选择应该在所有选项上产生均匀的分布,这意味着它不喜欢某些元素而不是其他元素。






编辑:在答案中出现了一些很棒的想法,所以我只想对我正在寻找的内容进行更多的说明。我将这个问题用Sets作为这种情况的解决方案。我更喜欢这样的答案,即避免在Set内部使用任何外部功能簿记,以及
$ b



  1. 保持良好的性能(平均优于 O(n)),即使该函数仅在每个唯一集合中使用一次。

我也有这种对工作代码的热爱,所以如果您的答案包含工作解决方案,那么期望(至少)+1。

解决方案

这是一个想法:您可以进行间隔平分。


  1. size s 是不变的时间。使用 randomR 可以获得您选择的集合的多少。 split findMin 和 findMax 之间添加各种值,直到获得所需位置的元素。如果你真的担心这个集合是由reals组成的并且是非常紧密集群的,那么你可以重新计算 findMin findMax 每次保证每次敲掉一些元素。

性能为O(n log n),基本上不会比你当前的解决方案,但只有很弱的条件才能使该集合不完全聚集在某个聚集点上,所以平均性能应该是〜((logn)^ 2),这是相当恒定的。如果它是一组整数,则得到O(log n * log m),其中m是该组的初始范围;它只是一些可能导致间隔平分(或其他数据类型的订单类型具有积累点)的严重表现的实体。这产生了一个完美均匀的分布,只要注意旁边的,以确保有可能获得顶部和底部的元素。

编辑:添加'code'



一些不雅观,未经检查的(伪?)代码。没有编译器在我的当前机器上进行抽烟测试,可能会发生错误,如果 s可能用更少的来完成。一件事:看看如何生成 mid ;它将需要一些调整,这取决于你是否在寻找一些适用于int或reals集合的东西(间隔平分本质上是拓扑结构,不应该对具有不同拓扑结构的集合有相同的工作量)。

b
$ b

  import Data.Set as Set 
import System.Random(getStdGen,randomR,RandomGen)

getNth如果n = 0,则(Set.findMin s)否则如果n + 1 = Set.size s则Set.findMax s
否则如果n < set.size bott然后getNth(bott,n)否则如果pres和Set.size bott = n那么n
否则如果pres然后getNth(top,n - Set.size bott - 1)else getNth(top,n - Set.size)
where mid =((Set.findMax s) - (Set.findMin s))/ 2 +(Set.findMin s)
(bott,pres,top)=(splitMember (n,g')= randomR(0,Set.size s-1)g $($ s
$ b $ randElem sg =(getNth(s,n),g')
g


I'd like to create this function, which selects a random element from a Set:

randElem :: (RandomGen g) => Set a -> g -> (a, g)

Simple listy implementations can be written. For example (code updated, verified working):

import Data.Set as Set
import System.Random (getStdGen, randomR, RandomGen)

randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem s g = (Set.toList s !! n, g')
    where (n, g') = randomR (0, Set.size s - 1) g

-- simple test drive
main = do g <- getStdGen
          print . fst $ randElem s g
    where s = Set.fromList [1,3,5,7,9]

But using !! incurs a linear lookup cost for large (randomly selected) n. Is there a faster way to select a random element in a Set? Ideally, repeated random selections should produce a uniform distribution over all options, meaning it does not prefer some elements over others.


Edit: some great ideas are popping up in the answers, so I just wanted to throw a couple more clarifications on what exactly I'm looking for. I asked this question with Sets as the solution to this situation in mind. I'll prefer answers that both

  1. avoid using any outside-the-function bookkeeping beyond the Set's internals, and
  2. maintain good performance (better than O(n) on average) even though the function is only used once per unique set.

I also have this love of working code, so expect (at minimum) a +1 from me if your answer includes a working solution.

解决方案

Here's an idea: You could do interval bisection.

  1. size s is constant time. Use randomR to get how far into the set you are selecting.
  2. Do split with various values between the original findMin and findMax until you get the element at the position you want. If you really fear that the set is made up say of reals and is extremely tightly clustered, you can recompute findMin and findMax each time to guarantee knocking off some elements each time.

The performance would be O(n log n), basically no worse than your current solution, but with only rather weak conditions to the effect that the set not be entirely clustered round some accumulation point, the average performance should be ~((logn)^2), which is fairly constant. If it's a set of integers, you get O(log n * log m), where m is the initial range of the set; it's only reals that could cause really nasty performance in an interval bisection (or other data types whose order-type has accumulation points).

PS. This produces a perfectly even distribution, as long as watching for off-by-ones to make sure it's possible to get the elements at the top and bottom.

Edit: added 'code'

Some inelegant, unchecked (pseudo?) code. No compiler on my current machine to smoke test, possibility of off-by-ones, and could probably be done with fewer ifs. One thing: check out how mid is generated; it'll need some tweaking depending on whether you are looking for something that works with sets of ints or reals (interval bisection is inherently topological, and oughtn't to work quite the same for sets with different topologies).

import Data.Set as Set
import System.Random (getStdGen, randomR, RandomGen)

getNth (s, n) = if n = 0 then (Set.findMin s) else if n + 1 = Set.size s then Set.findMax s
    else if n < Set.size bott then getNth (bott, n) else if pres and Set.size bott = n then n
    else if pres then getNth (top, n - Set.size bott - 1) else getNth (top, n - Set.size)
    where mid = ((Set.findMax s) - (Set.findMin s)) /2 + (Set.findMin s)
          (bott, pres, top) = (splitMember mid s)

randElem s g = (getNth(s, n), g')
    where (n, g') = randomR (0, Set.size s - 1) g

这篇关于从集合中选择随机元素,比线性时间快(Haskell)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆