计算文件夹大小 [英] Computing folder size
问题描述
我尝试并行计算文件夹大小。
也许是天真的方法。
我做的是,我给每个分支节点(目录)计算到代理。
所有叶节点的文件大小都添加到my-size。
它不工作。 :)
'scan'工作正常,顺序。
'pscan'只打印来自第一级的文件。
(def agents(atom []))
(def my-size(atom 0))
(def root-dir(clojure.java.io/file/))
(defn scan [listing]
(doseq [f listing]
(if(.isDirectory f)
(scan(.listFiles f))
(swap!my-size#(+%(.length f)) ))))
(defn pscan [listing]
(doseq [f listing]
(if(.isDirectory f)
.listFiles f))]
(do(swap!agents#(conj%a))
(send-off a pscan)
(println(.getName f))))
(swap!my-size#(+%(.length f))))))
这不是:)
试图更好地解决这个问题 。我意识到我正在做阻塞I / O操作,所以pmap不做这项工作。
我想可能给代理块分配目录(分支)来独立处理它是有意义的。看起来它确实:)
我还没有基准测试。
它的工作原理,但是,可能有一些问题的符号链接在UNIX系统。
(def user-dir(clojure.java.io/file/ home / janko / projects /))
(def root-dir(clojure.java.io/file/))
(def run?(atom true))
(def * max-queue-length * 1024)
(def * max-wait-time * 1000);; wait最大1秒,然后处理任何剩余
(def * chunk-size * 64)
(def queue(java.util.concurrent.LinkedBlockingQueue。* max-queue-length *))
(def agent(atom []))
(def size-total(atom 0))
(def a(agent []))
(defn branch-节点]
(如果@run?
(doseq [f node]
(when(.isDirectory f)
(do(.put queue f)
生成器(.listFiles f)))))))
(defn producer [node]
(future
(branch-producer node)))
b $ b(defn node-consumer [node]
(if(.isFile node)
(.length node)
0))
(defn chunk- length []
(min(.size queue)* chunk-size *))
(defn compute-sizes [ f](.listFiles f))a)]
(swap!size-total#(+%(apply +(map node-consumer i)))))
消费者[]
(future
(while @run?
(when-let [size(if(zero?(chunk-length))
false
-length));适当的工作大小
(绑定[a(agent [])]
(dotimes [_ size];给我们所有目录来处理
item(.poll queue)]
(set!一个(agent(conj @a item)))))
(swap!agents#(conj%a))
(send-off a compute-sizes) * max-wait-time *)))))
>
(producer(list user-dir))
pre>
(consumer)
对于结果类型
@ size-total
您可以停止它(有运行期货 - 如果我错了,请纠正我)
(swap!run?not)
如果您发现任何错误/错误,欢迎您分享您的想法!
I'm trying to compute folder size in parallel. Maybe it's naive approach. What I do, is that I give computation of every branch node (directory) to an agent. All leaf nodes have their file sizes added to my-size. Well it doesn't work. :)
'scan' works ok, serially. 'pscan' prints only files from first level.
(def agents (atom [])) (def my-size (atom 0)) (def root-dir (clojure.java.io/file "/")) (defn scan [listing] (doseq [f listing] (if (.isDirectory f) (scan (.listFiles f)) (swap! my-size #(+ % (.length f)))))) (defn pscan [listing] (doseq [f listing] (if (.isDirectory f) (let [a (agent (.listFiles f))] (do (swap! agents #(conj % a)) (send-off a pscan) (println (.getName f)))) (swap! my-size #(+ % (.length f))))))
Do you have any idea, what have i done wrong?
Thanks.
解决方案So counting filesizes in parallel should be so easy?
It's not :)
I tried to solve this issue better. I realized that i'm doing blocking I/O operations so pmap doesn't do the job. I was thinking maybe giving chunks of directories (branches) to agents to process it independently would make sense. Looks it does :) Well I haven't benchmarked it yet.
It works, but, there might be some problems with symbolic links on UNIX-like systems.
(def user-dir (clojure.java.io/file "/home/janko/projects/")) (def root-dir (clojure.java.io/file "/")) (def run? (atom true)) (def *max-queue-length* 1024) (def *max-wait-time* 1000) ;; wait max 1 second then process anything left (def *chunk-size* 64) (def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length* )) (def agents (atom [])) (def size-total (atom 0)) (def a (agent [])) (defn branch-producer [node] (if @run? (doseq [f node] (when (.isDirectory f) (do (.put queue f) (branch-producer (.listFiles f))))))) (defn producer [node] (future (branch-producer node))) (defn node-consumer [node] (if (.isFile node) (.length node) 0)) (defn chunk-length [] (min (.size queue) *chunk-size*)) (defn compute-sizes [a] (doseq [i (map (fn [f] (.listFiles f)) a)] (swap! size-total #(+ % (apply + (map node-consumer i)))))) (defn consumer [] (future (while @run? (when-let [size (if (zero? (chunk-length)) false (chunk-length))] ;appropriate size of work (binding [a (agent [])] (dotimes [_ size] ;give us all directories to process (when-let [item (.poll queue)] (set! a (agent (conj @a item))))) (swap! agents #(conj % a)) (send-off a compute-sizes)) (Thread/sleep *max-wait-time*)))))
You can start it by typing
(producer (list user-dir)) (consumer)
For result type
@size-total
You can stop it by (there are running futures - correct me if I'm wrong)
(swap! run? not)
If you find any errors/mistakes, you're welcome to share your ideas!
这篇关于计算文件夹大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!