Idiomatic / Efficient Clojure方法来交叉两个先验排序的向量? [英] Idiomatic/Efficient Clojure way to intersect two a priori sorted vectors?

查看:134
本文介绍了Idiomatic / Efficient Clojure方法来交叉两个先验排序的向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一对向量 x y 的唯一项,每个我知道要排序。我希望有两个交叉点,维持排序顺序。结果理想地是另一个向量,用于快速随机存取。



下面的代码仅仅是为了举例,我的 x / code>将会被预分类和预先区分(它们实际上是时间样本)。

 (defn gen-示例[c]( - >(重复c#( - > c rand int))distinct sort vec))

user => (def x(gen-example 100000))(count x)
#'user / x
63161
user => (def y(gen-example 100000))(count y)
#'user / y
63224

我知道Clojure有 clojure.set / intersection 可以在 sorted-set 。我的 x y 具有相同的属性(排序的不同元素),但不是相同的类型。



问题1:有更好/更快的方法来转换 x sorted-set

  user => (time(def ssx(apply sorted-set x)))
已用时间:607.642592 msecs
user => (time(def ssy(apply sorted-set y)))
已用时间:617.046022 msecs


$ b b

现在我准备好执行我的交集了。

  user => (time(count(clojure.set / intersection ssx ssy)))
已用时间:355.42534 msecs
39992

这是有点令人失望的性能,粗略看看(源clojure.set / intersection)似乎没有显示任何特殊处理问题2:是否有更好/更快的方法来执行 sorted->

设置 clojure.set / intersection

 (defn intersect-sorted-vector [xy] 
(loop [x(seq x)y(seq y)acc []]
$ b(let [x1(first x)
y1(first y)]
(cond
(< x1 y1)(recur(next x)y acc)
> x1 y1)(recur x(next y)acc)
:else(recur(next x)(next y)(conj acc x1))))
acc)))

这是一个很好的交易(接近10倍)。

  user => (time(count(intersect-sorted-vector xy)))
经过时间:40.142532 msecs
39992

但是,我不禁感到我的代码过度的程序/迭代。



问题3:任何人都可以建议一种更惯用的方式来处理Clojure中的一对向量?

解决方案

这通常是快速Clojure代码看起来有点命令的情况。功能代码通常是优雅的,但是带有一些必须支付的相关性能成本(懒惰,丢弃的不可变对象的额外的GC压力等)。



成套总是会更昂贵。建立一个集合本身是一个 O(n log n)操作,但你可以利用的事实,矢量已经支持实现交叉操作 O(n)时间。



你的代码已经相当不错,但还是有几个优化可以做: / p>


  • 使用瞬态向量收集结果。

  • 将基元的索引访问用于向量,而不是使用first / next遍历一个序列。这避免创建临时seq对象(和相关的GC)



结果代码可能如下所示:

 (defn intersect-sorted-vector [xy] 
(loop [i(long 0),j(long 0),r(transient []) ]
(let [xi(nth xi nil),yj(nth yj nil)]
(cond
(not(或xi yj))(persistent!r)
< xi yj)(recur(inc i)jr)
(> xi yj)(recur i(inc j)r)
: !r xi))))))

(time(count(intersect-sorted-vector xy)))
=> 经过时间:5.143687 msecs
=> 40258

正如你所看到的,这可能会给你一个额外的6-8倍的加速。 / p>

I have a pair of vectors x and y of unique items, each of which I know to be sorted. I wish to have the intersection of the two, maintaining sort order. The result ideally would be another vector, for fast random access.

The generation below is merely for the sake of example, my x and y will come presorted and pre-distinct (they are in fact time samples).

(defn gen-example [c] (-> (repeatedly c #(-> c rand int)) distinct sort vec))

user=> (def x (gen-example 100000)) (count x)
#'user/x
63161
user=> (def y (gen-example 100000)) (count y)
#'user/y
63224

I know Clojure has clojure.set/intersection which can work on a sorted-set. My x and y have the same properties (sorted distinct elements) but are not the same type.

Question 1: Is there a better/faster way to convert x and y to sorted-sets than (apply sorted-set x) given that they are already distinct and sorted?

user=> (time (def ssx (apply sorted-set x)))
"Elapsed time: 607.642592 msecs"
user=> (time (def ssy (apply sorted-set y)))
"Elapsed time: 617.046022 msecs"

Now I am ready to perform my intersection

user=> (time (count (clojure.set/intersection ssx ssy)))
"Elapsed time: 355.42534 msecs"
39992

This is somewhat disappointing performance, and a cursory look at (source clojure.set/intersection) does not seem to show any special treatment for the fact that these sets are sorted.

Question 2: Is there a better/faster way to perform the intersection of sorted-sets than clojure.set/intersection?

(defn intersect-sorted-vector [x y] 
  (loop [x (seq x) y (seq y) acc []] 
    (if (and x y)
      (let [x1 (first x) 
            y1 (first y)] 
      (cond 
        ( < x1 y1) (recur (next x) y acc) 
        ( > x1 y1) (recur x (next y) acc) 
        :else (recur (next x) (next y) (conj acc x1))))
    acc)))

This turns out to be a good deal (nearly 10x) faster.

user=> (time (count (intersect-sorted-vector x y)))
"Elapsed time: 40.142532 msecs"
39992

But, I can't help but feel that my code is unduly procedural/iterative.

Question 3: Could anyone kindly suggest a more idiomatic way to process a pair of vectors in Clojure?

解决方案

It is often the case that fast Clojure code looks a bit imperative. Functional code is often elegant, but comes with some associated performance costs that you have to pay for (laziness, extra GC pressure from discarded immutable objects etc.)

Also, converting into sets is always going to be more expensive. Building a set is an O(n log n) operation in itself, but you can exploit the fact that the vectors are already supported to implement an intersection operation in O(n) time.

Your code is already pretty good, but there are still a couple more optimisations you can do:

  • Use a transient vector to collect the results. These are a bit faster than regular persistent vectors for lots of sequential conj operations.
  • Used indexed access with primitives into the vectors rather than traversing a sequence with first/next. This avoids creating temporary seq objects (and related GC)

Resulting code might look something like:

(defn intersect-sorted-vector [x y]
  (loop [i (long 0), j (long 0), r (transient [])]
    (let [xi (nth x i nil), yj (nth y j nil)]
      (cond 
        (not (or xi yj)) (persistent! r)
        (< xi yj) (recur (inc i) j r)
        (> xi yj) (recur i (inc j) r)
        :else (recur (inc i) (inc j) (conj! r xi))))))

(time (count (intersect-sorted-vector x y)))
=> "Elapsed time: 5.143687 msecs"
=> 40258

So as you can see, this probably gives you an extra 6-8x speedup or thereabouts.

这篇关于Idiomatic / Efficient Clojure方法来交叉两个先验排序的向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆