昂贵的算法的Clojure性能 [英] Clojure Performance For Expensive Algorithms

查看:93
本文介绍了昂贵的算法的Clojure性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一个算法来计算最长的连续公共子序列(不要与最长的公共子序列混淆,但对这个问题不重要)。我需要从这里挤压最大的性能,因为我会称它很多。我在Clojure和Java中实现了相同的算法,以便比较性能。 Java版本运行速度明显更快。 我的问题是,是否有任何我可以做的Clojure版本,以加快到Java的水平。



这里是Java代码:

  public static int lcs(String [] a1,String [] a2){
if(a1 == null || a2 == null){
return 0;
}

int matchLen = 0;
int maxLen = 0;

int a1Len = a1.length;
int a2Len = a2.length;
int [] prev = new int [a2Len + 1]; //保存来自内部for循环的上一次迭代的数据
int [] curr = new int [a2Len + 1]; //用于内循环的'current'迭代

for(int i = 0; i for(int j = 0; j < a2Len; ++ j){
if(a1 [i] .equals(a2 [j])){
matchLen = prev [j] + 1; // curr和prev被填充1,以允许在j = 0时进行此赋值
}
else {
matchLen = 0;
}
curr [j + 1] = matchLen;

if(matchLen> maxLen){
maxLen = matchLen;
}
}

int [] swap = prev;
prev = curr;
curr = swap;
}

return maxLen;
}

这里是Clojure版本的相同:

 (defn lcs 
[#^[Ljava.lang.String;a1#^[Ljava.lang.String;a2]
(let [a1-len(alength a1)
a2-len(length a2)
prev(int-array(inc a2-len))
curr (inc a2-len))]
(循环[i 0 max-len 0 prev prev curr curr]
(if(< i a1-len)

(loop [j 0 max-len max-len]
(if(< j a2-len)
(if(=(aget a1 i)(aget a2 j))
(let [match-len(inc(aget prev j))]
(do
(aset-int curr(inc j)match-len)
))))
(do
(aset-int curr(inc j)0)
(recur(inc j)max-len)
max-len))
curr
prev)
max-len))))

现在让我们在我的机器上测试一下:

  b $ b(defn get-random-id [n](apply str(repeated n#(rand-nth pool)))
(def a1 -id 5))))
(def a2(into-array(take 10000(repeated#(get-random-id 5)))))
pre>

Java:

 (时间(Ratcliff / lcs a1 a2 ))
已用时间:1521.455 msecs

Clojure:

 (time(lcs a1 a2))
已用时间:19863.633 msecs
pre>

Clojure很快,但仍然比Java慢一个数量级。有什么我可以做的来弥补这个差距吗?或者有我最大化它,一个数量级是最小的Clojure开销。



正如你可以看到,我已经使用低级 ,我使用本机Java数组,我有类型提示的参数,以避免反射。



有一些算法优化可能,但我不想去那里现在。我很好奇我可以得到多么接近Java性能。如果我不能缩小差距,我将使用Java代码。

解决方案

编辑这个项目的其他部分在Clojure,



以下是我的收据:

 (defn my-lcs [^ objects a1 ^ objects a2] 
(first
(let [n(inc(alength a1))]
a1 i
[max-len ^ ints prev ^ ints curr] [0(int-array n)(int-array n)]
[aduce a2 j max-len(unchecked-long max- len)
(let [match-len
(if(.equals(aget a1 i)(aget a2 j))
(unchecked-inc(aget prev j))
0)]
(aset curr(unchecked-inc j)match-len)
(if(> match-len max-len)
match-len
max-len )))
curr prev]))))

与yours的主要区别: a [gs] et vs a [gs] et-int ,使用 code> ops(隐式通过 areduce ),使用向量作为返回值(和交换机制),max-len在内循环(原始值循环是有问题的,从1.5RC2开始稍微小一些,但是支持还不完美,但 * warn-on-reflection * 不是沉默的)。



我切换到 .equals 而不是 = 避免Clojure的等价逻辑。



编辑:让我们变得丑陋并恢复数组swap技巧:

 (deftype F [^:unsynchronized-mutable ^ ints curr 
^:unsynchronized-mutable ^ ints prev]
clojure.lang.IFn
(invoke [_ a1 a2]
(let [^ objects a1 a1
^ objects a2 a2]
(areduce a1 i max-len 0
(let [ m(arebece a2 j max-len(unchecked-long max-len)
(let [match-len
(if(.equals(aget a1 i)(aget a2 j))
(unchecked-int(aget prev j))
0)]
(aset curr(unchecked-inc j)(unchecked-int match-len))
(if(> match- len max-len)
match-len
max-len)))
bak curr]
(set! curr prev)
(set!prev bak)
m)))))

(defn my-lcs2 [^ objects a1 a2]
(let [n inc(alength a1))
f(F.(int array n)(int array n))]
(f a1 a2)))
/ pre>

在我的盒子上,它的速度快了30%。


I have implemented an algorithm to calculate the longest contiguous common subsequence (not to be confused with longest common subsequence, though not important for this questions). I need to squeeze maximum performance from this because I'll be calling it a lot. I have implemented the same algorithm in Clojure and Java in order to compare performance. The Java version runs significantly faster. My question is whether there is anything I can do to the Clojure version to speed it up to the level of Java.

Here's the Java code:

public static int lcs(String[] a1, String[] a2) {
    if (a1 == null || a2 == null) {
        return 0;
    }

    int matchLen = 0;
    int maxLen = 0;

    int a1Len = a1.length;
    int a2Len = a2.length;
    int[] prev = new int[a2Len + 1]; // holds data from previous iteration of inner for loop
    int[] curr = new int[a2Len + 1]; // used for the 'current' iteration of inner for loop

    for (int i = 0; i < a1Len; ++i) {
        for (int j = 0; j < a2Len; ++j) {
            if (a1[i].equals(a2[j])) {
                matchLen = prev[j] + 1; // curr and prev are padded by 1 to allow for this assignment when j=0
            }
            else {
                matchLen = 0;
            }
            curr[j+1] = matchLen;

            if (matchLen > maxLen) {
                maxLen = matchLen;
            }
        }

        int[] swap = prev;
        prev = curr;
        curr = swap;
    }

    return maxLen;
}

Here is the Clojure version of the same:

(defn lcs
  [#^"[Ljava.lang.String;" a1 #^"[Ljava.lang.String;" a2]
  (let [a1-len (alength a1)
        a2-len (alength a2)
        prev (int-array (inc a2-len))
        curr (int-array (inc a2-len))]
    (loop [i 0 max-len 0 prev prev curr curr]
      (if (< i a1-len)
        (recur (inc i)
               (loop [j 0 max-len max-len]
                 (if (< j a2-len)
                   (if (= (aget a1 i) (aget a2 j))
                     (let [match-len (inc (aget prev j))]
                       (do
                         (aset-int curr (inc j) match-len)
                         (recur (inc j) (max max-len match-len))))
                     (do
                       (aset-int curr (inc j) 0)
                       (recur (inc j) max-len)))
                   max-len))
               curr
               prev)
        max-len))))

Now let's test these on my machine:

(def pool "ABC")
(defn get-random-id [n] (apply str (repeatedly n #(rand-nth pool))))
(def a1 (into-array (take 10000 (repeatedly #(get-random-id 5)))))
(def a2 (into-array (take 10000 (repeatedly #(get-random-id 5)))))

Java:

(time (Ratcliff/lcs a1 a2))
"Elapsed time: 1521.455 msecs"

Clojure:

(time (lcs a1 a2))
"Elapsed time: 19863.633 msecs"

Clojure is quick but still an order of magnitude slower than Java. Is there anything I can do to close this gap? Or have I maxed it out and one order of magnitude is the "minimal Clojure overhead."

As you can see I am already using the "low level" construct of loop, I am using native Java arrays and I have type-hinted the parameters to avoid reflection.

There some algorithm optimizations possible, but I don't want to go there right now. I am curious how close to Java performance I can get. If I can't close the gap I'll just go with the Java code. The rest of this project is in Clojure, but perhaps sometimes dropping down to Java for performance is necessary.

解决方案

EDIT: Added a faster uglier version below the first one.

Here is my take:

(defn my-lcs [^objects a1 ^objects a2]
  (first
    (let [n (inc (alength a1))]
      (areduce a1 i 
        [max-len ^ints prev ^ints curr] [0 (int-array n) (int-array n)]
        [(areduce a2 j max-len (unchecked-long max-len)
           (let [match-len 
                 (if (.equals (aget a1 i) (aget a2 j))
                   (unchecked-inc (aget prev j))
                   0)]
             (aset curr (unchecked-inc j) match-len)
             (if (> match-len max-len)
               match-len
               max-len)))
         curr prev]))))

Main differences with yours: a[gs]et vs a[gs]et-int, use of unchecked- ops (implicitly through areduce), use of a vector as the return value (and "swap" mechanism) and max-len is coerced to primitive before the inner loop (primitive-valued loops are problematic, slightly less since 1.5RC2 but the support isn't perfect yet, however *warn-on-reflection* is not silent).

And I switched to .equals instead of = to avoid the logic in Clojure's equiv.

EDIT: let's get ugly and restore the arrays swap trick:

(deftype F [^:unsynchronized-mutable ^ints curr
            ^:unsynchronized-mutable ^ints prev]
  clojure.lang.IFn
  (invoke [_ a1 a2]
    (let [^objects a1 a1
          ^objects a2 a2]
      (areduce a1 i max-len 0
        (let [m (areduce a2 j max-len (unchecked-long max-len)
                  (let [match-len 
                        (if (.equals (aget a1 i) (aget a2 j))
                          (unchecked-inc (aget prev j))
                          0)]
                    (aset curr (unchecked-inc j) (unchecked-int match-len))
                    (if (> match-len max-len)
                      match-len
                      max-len)))
              bak curr]
          (set! curr prev)
          (set! prev bak)
          m)))))

(defn my-lcs2 [^objects a1 a2]
  (let [n (inc (alength a1))
        f (F. (int-array n) (int-array n))]
    (f a1 a2)))

On my box, it's 30% faster.

这篇关于昂贵的算法的Clojure性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆