为什么在Clojure中的瞬态映射中插入1000 000个值会产生一个包含8个项目的映射? [英] Why inserting 1000 000 values in a transient map in Clojure yields a map with 8 items in it?

查看:128
本文介绍了为什么在Clojure中的瞬态映射中插入1000 000个值会产生一个包含8个项目的映射?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我尝试在瞬态向量上做1000 000 assoc!,我会得到一个包含1000 000个元素的向量

 (count 
(let [m(transient [])]
(dotimes [i 1000000]
(assoc!mii)) (persistent!m)))
; => 1000000

另一方面,如果我对地图做同样的事情,

 (count 
(let [m(transient {})]
(dotimes [i 1000000]
(assoc!mii))(persistent!m)))
; => 8

这是发生的原因吗?

解决方案

临时数据类型的操作不保证它们返回与传入的引用相同的引用。有时实现可能决定返回一个新的(但仍然是瞬时的)



//clojuredocs.org/clojure.core/assoc!\">在 assoc! 上的ClojureDocs页面有一个 nice示例,说明此行为:

  ;;这里要理解的关键概念是瞬变是
;;不是为了沐浴在地;总是使用值
;;由assoc返回!或其他操作
;;瞬变。

(defn merge2
使用瞬态的`merge'的一个示例实现
[xy]
(reduce
(fn [ res [kv]](assoc!res kv))
(transient x)
y)))

;为什么总是使用返回值,而不是原来的?因为返回
;;值可能是与原始值不同的对象。实现
;;的Clojure瞬态在某些情况下改变内部表示
;;的瞬时集合(例如,当它达到一定大小时)。在这样的
;; case,如果你继续尝试修改原始对象,结果
;;将不正确。

;;想象瞬态如持久集合在你如何编写代码
;;更新它们,除了不同于持久化集合,原始集合
;;您传入的内容应被视为具有未定义的值。只有返回
;;值是可预测的。

我想重复最后一部分,因为它非常重要: 您传入的原始集合应被视为具有未定义的值。只有返回值是可预测的。



这是您的代码的修改版本,可按预期工作:

 (count 
(let [m(transient {})]
(persistent!
](assoc!acc ii))
m(range 1000000)))))




另一方面,你总是得到8的原因是因为Clojure喜欢使用 clojure.lang.PersistentArrayMap (一个映射支持通过数组)用于具有8个或更少元素的地图。一旦你超过8,它切换到 clojure.lang.PersistentHashMap

  user => (类型{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a})
clojure.lang.PersistentArrayMap
user => (类型{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a})
clojure.lang.PersistentHashMap

一旦你超过了8个条目,你的临时映射就会将支持数据结构从一个数组对( PersistentArrayMap )切换到hashtable( PersistentHashMap ),此时 assoc!返回一个新的引用,而不是只更新旧的引用。 p>

If I try to do 1000 000 assoc! on a transient vector, I'll get a vector of 1000 000 elements

(count
  (let [m (transient [])]
    (dotimes [i 1000000]
      (assoc! m i i)) (persistent! m)))
; => 1000000

on the other hand, if I do the same with a map, it will only have 8 items in it

(count
  (let [m (transient {})]
    (dotimes [i 1000000]
      (assoc! m i i)) (persistent! m)))
; => 8

Is there a reason why this is happening?

解决方案

The transient datatypes' operations don't guarantee that they will return the same reference as the one passed in. Sometimes the implementation might decide to return a new (but still transient) map after an assoc! rather than using the one you passed in.

The ClojureDocs page on assoc! has a nice example that explains this behavior:

;; The key concept to understand here is that transients are 
;; not meant to be `bashed in place`; always use the value 
;; returned by either assoc! or other functions that operate
;; on transients.

(defn merge2
  "An example implementation of `merge` using transients."
  [x y]
  (persistent! (reduce
                (fn [res [k v]] (assoc! res k v))
                (transient x)
                y)))

;; Why always use the return value, and not the original?  Because the return
;; value might be a different object than the original.  The implementation
;; of Clojure transients in some cases changes the internal representation
;; of a transient collection (e.g. when it reaches a certain size).  In such
;; cases, if you continue to try modifying the original object, the results
;; will be incorrect.

;; Think of transients like persistent collections in how you write code to
;; update them, except unlike persistent collections, the original collection
;; you passed in should be treated as having an undefined value.  Only the return
;; value is predictable.

I'd like to repeat that last part because it's very important: the original collection you passed in should be treated as having an undefined value. Only the return value is predictable.

Here's a modified version of your code that works as expected:

(count
  (let [m (transient {})]
    (persistent!
      (reduce (fn [acc i] (assoc! acc i i))
              m (range 1000000)))))


As a side note, the reason you always get 8 is because Clojure likes to use a clojure.lang.PersistentArrayMap (a map backed by an array) for maps with 8 or fewer elements. Once you get past 8, it switches to clojure.lang.PersistentHashMap.

user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a})
clojure.lang.PersistentArrayMap
user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a})
clojure.lang.PersistentHashMap

Once you get past 8 entries, your transient map switches the backing data structure from an array of pairs (PersistentArrayMap) to a hashtable (PersistentHashMap), at which point assoc! returns a new reference instead of just updating the old one.

这篇关于为什么在Clojure中的瞬态映射中插入1000 000个值会产生一个包含8个项目的映射?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆