Clojure 的 map 函数的高效仅副作用模拟 [英] Efficient side-effect-only analogue of Clojure's map function

查看:17
本文介绍了Clojure 的 map 函数的高效仅副作用模拟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果 mapdoseq 有孩子会怎样?我正在尝试编写一个函数或宏,如 Common Lisp 的 mapc,但在 Clojure 中.这基本上做了map所做的,但有副作用,所以它不需要生成一系列结果,也不会偷懒.我知道可以使用 doseq 迭代单个序列,但 map 可以迭代多个序列,依次将函数应用于所有序列的每个元素.我也知道可以将 map 包裹在 dorun 中.(注意:此问题在经过多次评论和非常详尽的回答后已被广泛编辑.最初的问题侧重于宏,但这些宏问题被证明是外围问题.)

What if map and doseq had a baby? I'm trying to write a function or macro like Common Lisp's mapc, but in Clojure. This does essentially what map does, but only for side-effects, so it doesn't need to generate a sequence of results, and wouldn't be lazy. I know that one can iterate over a single sequence using doseq, but map can iterate over multiple sequences, applying a function to each element in turn of all of the sequences. I also know that one can wrap map in dorun. (Note: This question has been extensively edited after many comments and a very thorough answer. The original question focused on macros, but those macro issues turned out to be peripheral.)

这很快(根据标准):

(defn domap2
  [f coll]
  (dotimes [i (count coll)]
    (f (nth coll i))))

但它只接受一个集合.这接受任意集合:

but it only accepts one collection. This accepts arbitrary collections:

(defn domap3
  [f & colls]
  (dotimes [i (apply min (map count colls))]
    (apply f (map #(nth % i) colls))))

但相比之下,它非常慢.我也可以写一个像第一个一样的版本,但是有不同的参数情况 [f c1 c2], [f c1 c2 c3] 等,但最后,我需要一个处理任意数量集合的案例,就像最后一个例子一样,无论如何它更简单.我也尝试了许多其他解决方案.

but it's very slow by comparison. I could also write a version like the first, but with different parameter cases [f c1 c2], [f c1 c2 c3], etc., but in the end, I'll need a case that handles arbitrary numbers of collections, like the last example, which is simpler anyway. I've tried many other solutions as well.

由于第二个示例与第一个示例非常相似,除了在循环中使用了 applymap,我怀疑去掉它们会加快速度涨了很多.我试图通过将 domap2 编写为宏来实现这一点,但是处理 & 之后的所有变量的方式一直让我感到困惑,如上所示.

Since the second example is very much like the first except for the use of apply and the map inside the loop, I suspect that getting rid of them would speed things up a lot. I have tried to do this by writing domap2 as a macro, but the way that the catch-all variable after & is handled keeps tripping me up, as illustrated above.

其他示例(15 或 20 个不同版本中的)、基准代码以及几年前 Macbook Pro 上的时间(完整来源 这里):

Other examples (out of 15 or 20 different versions), benchmark code, and times on a Macbook Pro that's a few years old (full source here):

(defn domap1
  [f coll]
  (doseq [e coll] 
    (f e)))

(defn domap7
  [f coll]
  (dorun (map f coll)))

(defn domap18
  [f & colls]
  (dorun (apply map f colls)))

(defn domap15
  [f coll] 
  (when (seq coll)
    (f (first coll))
    (recur f (rest coll))))

(defn domap17
  [f & colls]
  (let [argvecs (apply (partial map vector) colls)] ; seq of ntuples of interleaved vals
    (doseq [args argvecs]
      (apply f args))))

我正在开发一个使用 core.matrix 矩阵和向量的应用程序,但可以在下面随意替换您自己的副作用函数.

I'm working on an application that uses core.matrix matrices and vectors, but feel free to substitute your own side-effecting functions below.

(ns tst
  (:use criterium.core
        [clojure.core.matrix :as mx]))

(def howmany 1000)
(def a-coll (vec (range howmany)))
(def maskvec (zero-vector :vectorz howmany))

(defn unmaskit!
  [idx]
  (mx/mset! maskvec idx 1.0)) ; sets element idx of maskvec to 1.0

(defn runbench
  [domapfn label]
  (print (str "
" label ":
"))
  (bench (def _ (domapfn unmaskit! a-coll))))

根据标准的平均执行时间,以微秒为单位:

Mean execution times according to Criterium, in microseconds:

domap1: 12.317551 [doseq]
domap2:19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7:53.263230 [带有 dorun 的地图]
domap18:54.456801 [带有 dorun 的地图,多个集合]
domap15:32.034993 [重复]
domap17:95.259984 [doseq,使用地图交错的多个集合]

domap1: 12.317551 [doseq]
domap2: 19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7: 53.263230 [map with dorun]
domap18: 54.456801 [map with dorun, multiple collections]
domap15: 32.034993 [recur]
domap17: 95.259984 [doseq, multiple collections interleaved using map]

对于多个大型惰性序列参数,dorun+map 可能是实现 domap 的最佳方式,但是 doseq 仍然是单个惰性序列的王者.执行与上述 unmask! 相同的操作,但通过 (mod idx 1000) 运行索引,并遍历 (range 100000000), <在我的测试中,code>doseq 的速度大约是 dorun+map 的两倍(即 (def domap25 (comp dorun map))).

It may be that dorun+map is the best way to implement domap for multiple large lazy sequence arguments, but doseq is still king when it comes to single lazy sequences. Performing the same operation as unmask! above, but running the index through (mod idx 1000), and iterating over (range 100000000), doseq is about twice as fast as dorun+map in my tests (i.e. (def domap25 (comp dorun map))).

推荐答案

你不需要宏,我不明白为什么宏在这里会有帮助.

You don't need a macro, and I don't see why a macro would be helpful here.

user> (defn do-map [f & lists] (apply mapv f lists) nil)
#'user/do-map
user> (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil

注意这里的 do-map 是急切的(感谢 mapv)并且只执行副作用

note do-map here is eager (thanks to mapv) and only executes for side effects

宏可以使用可变参数列表,正如 do-map 的(无用!)宏版本所展示的:

Macros can use varargs lists, as the (useless!) macro version of do-map demonstrates:

user> (defmacro do-map-macro [f & lists] `(do (mapv ~f ~@lists) nil))
#'user/do-map-macro
user> (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
user> (macroexpand-1 '(do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40)))
(do (clojure.core/mapv (comp println +) (range 2 6) (range 8 11) (range 22 40)) nil)

附录:解决效率/垃圾产生问题:
请注意,为了简洁起见,下面我截断了标准基准函数的输出:

Addendum: addressing the efficiency / garbage-creation concerns:
note that below I truncate the output of the criterium bench function, for conciseness reasons:

(defn do-map-loop
  [f & lists]
  (loop [heads lists]
    (when (every? seq heads)
      (apply f (map first heads))
      (recur (map rest heads)))))


user> (crit/bench (with-out-str (do-map-loop (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
            Execution time mean : 11.367804 µs
...

这看起来很有希望,因为它不会创建我们无论如何都不会使用的数据结构(与上面的 mapv 不同).但事实证明它比前一个慢(可能是因为两次 map 调用?).

This looks promising because it doesn't create a data structure that we aren't using anyway (unlike mapv above). But it turns out it is slower than the previous (maybe because of the two map calls?).

user> (crit/bench (with-out-str (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 7.427182 µs
...
user> (crit/bench (with-out-str (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 8.355587 µs
...

由于循环仍然不快,让我们尝试一个专门处理 arity 的版本,这样我们就不需要在每次迭代中调用 map 两次:

Since the loop still wasn't faster, let's try a version which specializes on arity, so that we don't need to call map twice on every iteration:

(defn do-map-loop-3
  [f a b c]
  (loop [[a & as] a
         [b & bs] b
         [c & cs] c]
    (when (and a b c)
      (f a b c)
      (recur as bs cs))))

值得注意的是,虽然速度更快,但仍然比刚刚使用 mapv 的版本慢:

Remarkably, though this is faster, it is still slower than the version that just used mapv:

user> (crit/bench (with-out-str (do-map-loop-3 (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 9.450108 µs
...

接下来我想知道输入的大小是否是一个因素.随着更大的输入...

Next I wondered if the size of the input was a factor. With larger inputs...

user> (def test-input (repeatedly 3 #(range (rand-int 100) (rand-int 1000))))
#'user/test-input
user> (map count test-input)
(475 531 511)
user> (crit/bench (with-out-str (apply do-map-loop-3 (comp println +) test-input)))
...
            Execution time mean : 1.005073 ms
...
user> (crit/bench (with-out-str (apply do-map (comp println +) test-input)))
...
             Execution time mean : 756.955238 µs
...

最后,为了完整起见,do-map-loop 的时间(正如预期的那样比 do-map-loop-3 稍慢)

Finally, for completeness, the timing of do-map-loop (which as expected is slightly slower than do-map-loop-3)

user> (crit/bench (with-out-str (apply do-map-loop (comp println +) test-input)))
...
             Execution time mean : 1.553932 ms

正如我们所见,即使输入尺寸更大,mapv 也更快.

As we see, even with larger input sizes, mapv is faster.

(为了完整起见,我应该注意 map 比 mapv 稍微快一点,但不是很大).

(I should note for completeness here that map is slightly faster than mapv, but not by a large degree).

这篇关于Clojure 的 map 函数的高效仅副作用模拟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆