昂贵的算法的Clojure性能 [英] Clojure Performance For Expensive Algorithms
问题描述
我已经实现了一个算法来计算最长的连续公共子序列(不要与最长的公共子序列混淆,但对这个问题不重要)。我需要从这里挤压最大的性能,因为我会称它很多。我在Clojure和Java中实现了相同的算法,以便比较性能。 Java版本运行速度明显更快。 我的问题是,是否有任何我可以做的Clojure版本,以加快到Java的水平。
这里是Java代码:
public static int lcs(String [] a1,String [] a2){
if(a1 == null || a2 == null){
return 0;
}
int matchLen = 0;
int maxLen = 0;
int a1Len = a1.length;
int a2Len = a2.length;
int [] prev = new int [a2Len + 1]; //保存来自内部for循环的上一次迭代的数据
int [] curr = new int [a2Len + 1]; //用于内循环的'current'迭代
for(int i = 0; i for(int j = 0; j < a2Len; ++ j){
if(a1 [i] .equals(a2 [j])){
matchLen = prev [j] + 1; // curr和prev被填充1,以允许在j = 0时进行此赋值
}
else {
matchLen = 0;
}
curr [j + 1] = matchLen;
if(matchLen> maxLen){
maxLen = matchLen;
}
}
int [] swap = prev;
prev = curr;
curr = swap;
}
return maxLen;
}
这里是Clojure版本的相同:
(defn lcs
[#^[Ljava.lang.String;a1#^[Ljava.lang.String;a2]
(let [a1-len(alength a1)
a2-len(length a2)
prev(int-array(inc a2-len))
curr (inc a2-len))]
(循环[i 0 max-len 0 prev prev curr curr]
(if(< i a1-len)
(loop [j 0 max-len max-len]
(if(< j a2-len)
(if(=(aget a1 i)(aget a2 j))
(let [match-len(inc(aget prev j))]
(do
(aset-int curr(inc j)match-len)
))))
(do
(aset-int curr(inc j)0)
(recur(inc j)max-len)
max-len))
curr
prev)
max-len))))
现在让我们在我的机器上测试一下:
b $ b(defn get-random-id [n](apply str(repeated n#(rand-nth pool)))
pre>
(def a1 -id 5))))
(def a2(into-array(take 10000(repeated#(get-random-id 5)))))
Java:
(时间(Ratcliff / lcs a1 a2 ))
已用时间:1521.455 msecs
Clojure:
(time(lcs a1 a2))
pre>
已用时间:19863.633 msecs
Clojure很快,但仍然比Java慢一个数量级。有什么我可以做的来弥补这个差距吗?或者有我最大化它,一个数量级是最小的Clojure开销。
正如你可以看到,我已经使用低级 ,我使用本机Java数组,我有类型提示的参数,以避免反射。
有一些算法优化可能,但我不想去那里现在。我很好奇我可以得到多么接近Java性能。如果我不能缩小差距,我将使用Java代码。
解决方案编辑这个项目的其他部分在Clojure, :
以下是我的收据:
(defn my-lcs [^ objects a1 ^ objects a2]
(first
(let [n(inc(alength a1))]
a1 i
[max-len ^ ints prev ^ ints curr] [0(int-array n)(int-array n)]
[aduce a2 j max-len(unchecked-long max- len)
(let [match-len
(if(.equals(aget a1 i)(aget a2 j))
(unchecked-inc(aget prev j))
0)]
(aset curr(unchecked-inc j)match-len)
(if(> match-len max-len)
match-len
max-len )))
curr prev]))))
与yours的主要区别:
a [gs] et
vsa [gs] et-int
,使用code> ops(隐式通过
areduce
),使用向量作为返回值(和交换机制),max-len在内循环(原始值循环是有问题的,从1.5RC2开始稍微小一些,但是支持还不完美,但* warn-on-reflection *
不是沉默的)。
我切换到
.equals
而不是=
避免Clojure的等价逻辑。
编辑:让我们变得丑陋并恢复数组swap技巧:
(deftype F [^:unsynchronized-mutable ^ ints curr
/ pre>
^:unsynchronized-mutable ^ ints prev]
clojure.lang.IFn
(invoke [_ a1 a2]
(let [^ objects a1 a1
^ objects a2 a2]
(areduce a1 i max-len 0
(let [ m(arebece a2 j max-len(unchecked-long max-len)
(let [match-len
(if(.equals(aget a1 i)(aget a2 j))
(unchecked-int(aget prev j))
0)]
(aset curr(unchecked-inc j)(unchecked-int match-len))
(if(> match- len max-len)
match-len
max-len)))
bak curr]
(set! curr prev)
(set!prev bak)
m)))))
(defn my-lcs2 [^ objects a1 a2]
(let [n inc(alength a1))
f(F.(int array n)(int array n))]
(f a1 a2)))
在我的盒子上,它的速度快了30%。
I have implemented an algorithm to calculate the longest contiguous common subsequence (not to be confused with longest common subsequence, though not important for this questions). I need to squeeze maximum performance from this because I'll be calling it a lot. I have implemented the same algorithm in Clojure and Java in order to compare performance. The Java version runs significantly faster. My question is whether there is anything I can do to the Clojure version to speed it up to the level of Java.
Here's the Java code:
public static int lcs(String[] a1, String[] a2) { if (a1 == null || a2 == null) { return 0; } int matchLen = 0; int maxLen = 0; int a1Len = a1.length; int a2Len = a2.length; int[] prev = new int[a2Len + 1]; // holds data from previous iteration of inner for loop int[] curr = new int[a2Len + 1]; // used for the 'current' iteration of inner for loop for (int i = 0; i < a1Len; ++i) { for (int j = 0; j < a2Len; ++j) { if (a1[i].equals(a2[j])) { matchLen = prev[j] + 1; // curr and prev are padded by 1 to allow for this assignment when j=0 } else { matchLen = 0; } curr[j+1] = matchLen; if (matchLen > maxLen) { maxLen = matchLen; } } int[] swap = prev; prev = curr; curr = swap; } return maxLen; }
Here is the Clojure version of the same:
(defn lcs [#^"[Ljava.lang.String;" a1 #^"[Ljava.lang.String;" a2] (let [a1-len (alength a1) a2-len (alength a2) prev (int-array (inc a2-len)) curr (int-array (inc a2-len))] (loop [i 0 max-len 0 prev prev curr curr] (if (< i a1-len) (recur (inc i) (loop [j 0 max-len max-len] (if (< j a2-len) (if (= (aget a1 i) (aget a2 j)) (let [match-len (inc (aget prev j))] (do (aset-int curr (inc j) match-len) (recur (inc j) (max max-len match-len)))) (do (aset-int curr (inc j) 0) (recur (inc j) max-len))) max-len)) curr prev) max-len))))
Now let's test these on my machine:
(def pool "ABC") (defn get-random-id [n] (apply str (repeatedly n #(rand-nth pool)))) (def a1 (into-array (take 10000 (repeatedly #(get-random-id 5))))) (def a2 (into-array (take 10000 (repeatedly #(get-random-id 5)))))
Java:
(time (Ratcliff/lcs a1 a2)) "Elapsed time: 1521.455 msecs"
Clojure:
(time (lcs a1 a2)) "Elapsed time: 19863.633 msecs"
Clojure is quick but still an order of magnitude slower than Java. Is there anything I can do to close this gap? Or have I maxed it out and one order of magnitude is the "minimal Clojure overhead."
As you can see I am already using the "low level" construct of loop, I am using native Java arrays and I have type-hinted the parameters to avoid reflection.
There some algorithm optimizations possible, but I don't want to go there right now. I am curious how close to Java performance I can get. If I can't close the gap I'll just go with the Java code. The rest of this project is in Clojure, but perhaps sometimes dropping down to Java for performance is necessary.
解决方案EDIT: Added a faster uglier version below the first one.
Here is my take:
(defn my-lcs [^objects a1 ^objects a2] (first (let [n (inc (alength a1))] (areduce a1 i [max-len ^ints prev ^ints curr] [0 (int-array n) (int-array n)] [(areduce a2 j max-len (unchecked-long max-len) (let [match-len (if (.equals (aget a1 i) (aget a2 j)) (unchecked-inc (aget prev j)) 0)] (aset curr (unchecked-inc j) match-len) (if (> match-len max-len) match-len max-len))) curr prev]))))
Main differences with yours:
a[gs]et
vsa[gs]et-int
, use ofunchecked-
ops (implicitly throughareduce
), use of a vector as the return value (and "swap" mechanism) and max-len is coerced to primitive before the inner loop (primitive-valued loops are problematic, slightly less since 1.5RC2 but the support isn't perfect yet, however*warn-on-reflection*
is not silent).And I switched to
.equals
instead of=
to avoid the logic in Clojure's equiv.EDIT: let's get ugly and restore the arrays swap trick:
(deftype F [^:unsynchronized-mutable ^ints curr ^:unsynchronized-mutable ^ints prev] clojure.lang.IFn (invoke [_ a1 a2] (let [^objects a1 a1 ^objects a2 a2] (areduce a1 i max-len 0 (let [m (areduce a2 j max-len (unchecked-long max-len) (let [match-len (if (.equals (aget a1 i) (aget a2 j)) (unchecked-inc (aget prev j)) 0)] (aset curr (unchecked-inc j) (unchecked-int match-len)) (if (> match-len max-len) match-len max-len))) bak curr] (set! curr prev) (set! prev bak) m))))) (defn my-lcs2 [^objects a1 a2] (let [n (inc (alength a1)) f (F. (int-array n) (int-array n))] (f a1 a2)))
On my box, it's 30% faster.
这篇关于昂贵的算法的Clojure性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!