Clojure的独特+随机生成的流的复杂性 [英] Complexity of Clojure's distinct + randomly generated stream

查看:132
本文介绍了Clojure的独特+随机生成的流的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

表达式的时间复杂度

 (doall(take n(distinct stream)))

其中 stream 是一个延迟生成重复?



我想这部分取决于 stream 中重复的金额或机会?如果 stream (反复#(rand-int m)))其中 m& ; = n



我的估计:



列表中必须有至少一个从流实现的元素。如果流具有重复,则为多个。对于每次迭代,都有一个集合查找和/或插入,但由于这些接近恒定时间,我们至少得到: O(n *〜1)= O(n)然后是复制的一些复杂性。我的直觉是,复制品的复杂性也可以忽略,但我不知道如何正式化这。例如,我们不能只是说 O(n * k *〜1) = O(n)一些常量 k ,因为在流中没有明显的最大数量 k >

让我演示一些数据的问题:

 (defn stream [upper distinct-n] 
(let [counter(volatile!0)]
(doall(take distinct-n
(distinct
vswap!counter inc)
(rand-int upper))))))
@counter))

(defn sample [times-n upper distinct-n]
( - >>(重复次数-n
#(stream upper distinct-n))
频率
(按val排序)
reverse))

(sample 10000 5 1);; ([1 10000])
(sample 10000 5 2);; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])$ ​​b $ b(sample 10000 5 3);; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1]
(sample 10000 5 3);; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1]
(sample 10000 5 4);; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] 16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])$ ​​b $ b(sample 10000 5 5);; [8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] 17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19 ] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] 52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])$ ​​b $ b

请注意,对于最后一个示例,迭代次数 distinct 可以达到 66 ,虽然机会很小。
还要注意,对于(样例10000 nn)中增加 n



此图表说明了 n m的各种数字(> doall(take n)(重复使用#(rand-int m)))





为了完整性,这里是我用来生成图表的代码:

 (require'[com.hypirion.clj-xchart:as c] )

(defn most-common [times-n upper distinct-n]
( - >重复次数-n
# )
frequencies
(sort#by - (val%)))
ffirst))

(defn series [m]
{ strm =m
(let [x(range 1(inc m))]
{:xx
:y $ bx)})})

(c / view
(c / xy-chart
(合并(系列10)
b(系列50)
(系列100))
{:x轴{:titlen}
:y轴{:titlerealized}}))


解决方案

您的问题称为优惠券收藏家问题和预期的元素数量是通过将m / m + m /(m-1) )...,直到你有n个项目:

 (defn general-coupon-collector- expect 
n :结果集的基数(要收集的uniuque优惠券的数量)
m:从(#coupons存在)绘制的集合的基数
[nm]
{:pre [(< nm)]}
(double(apply +(mapv /(repeat m)(range m( - mn)-1))))
(general-coupon-collector- expect 25 25)
;; => 95
;;这将生成你绘图的数据:
(for [x(range 10 101 5)]
(general-coupon-collector-expect x 100))

最差的情况将是无限的。最佳情况将只是N.平均情况是O(N log N)。这忽略了检查元素是否已经绘制的复杂性。在实践中,它是用于clojure集的Log_32 N(在 distinct 中使用)。


What is the time complexity of an expression

(doall (take n (distinct stream)))

where stream is a lazily generated (possibly infinite) collection with duplicates?

I guess this partially depends on the amount or chance of duplicates in stream? What if stream is (repeatedly #(rand-int m))) where m >= n?

My estimation:

For every element in the resulting list there has to be at least one element realized from the stream. Multiple if the stream has duplicates. For every iteration there is a set lookup and/or insert, but since those are near constant time we get at least: O(n*~1) = O(n) and then some complexity for the duplicates. My intuition is that the complexity for the duplicates can be neglected too, but I'm not sure how to formalize this. For example, we cannot just say it is O(n*k*~1) = O(n) for some constant k since there is not an obvious maximum number k of duplicates we could encounter in the stream.

Let me demonstrate the problem with some data:

(defn stream [upper distinct-n]
  (let [counter (volatile! 0)]
    (doall (take distinct-n
                 (distinct
                  (repeatedly (fn []
                                (vswap! counter inc)
                                (rand-int upper))))))
    @counter))

(defn sample [times-n upper distinct-n]
  (->> (repeatedly times-n
                  #(stream upper distinct-n))
       frequencies
       (sort-by val)
       reverse))

(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])

Note that for the last sample the number of iterations distinct can go up to 66, although the chance is small. Also notice that for increasing n in (sample 10000 n n) the most likely number of realized elements from the stream seems to go up more than linearly.

This chart illustrates the number of realized elements from the input (most common occurance from 10000 samples) in (doall (take n (repeatedly #(rand-int m))) for various numbers of n and m.

For completeness, here is the code I used to generate the chart:

(require '[com.hypirion.clj-xchart :as c])

(defn most-common [times-n upper distinct-n]
  (->> (repeatedly times-n
                   #(stream upper distinct-n))
       frequencies
       (sort-by #(- (val %)))
       ffirst))

(defn series [m]
  {(str "m = " m)
   (let [x (range 1 (inc m))]
     {:x x
      :y (map #(most-common 10000 m %)
              x)})})

(c/view
 (c/xy-chart
  (merge (series 10)
         (series 25)
         (series 50)
         (series 100))
  {:x-axis {:title "n"}
   :y-axis {:title "realized"}}))

解决方案

Your problem is known as the Coupon collectors problem and the expected number of elements is given by just summing up m/m + m/(m-1) ... until you have your n items:

(defn general-coupon-collector-expect
  "n: Cardinality of resulting set (# of uniuque coupons to collect)
   m: Cardinality of set to draw from (#coupons that exist)"
  [n m]
  {:pre [(<= n m)]}
  (double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
  (general-coupon-collector-expect x 100))

Worst case will be infinite. Best case will be just N. Average case is O(N log N). This ignores the complexity of checking if an element has already been drawn. In practice it is Log_32 N for clojure sets (which is used in distinct).

这篇关于Clojure的独特+随机生成的流的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆