计算文件夹大小 [英] Computing folder size

查看:162
本文介绍了计算文件夹大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试并行计算文件夹大小。
也许是天真的方法。
我做的是,我给每个分支节点(目录)计算到代理。
所有叶节点的文件大小都添加到my-size。
它不工作。 :)



'scan'工作正常,顺序。
'pscan'只打印来自第一级的文件。

 (def agents(atom []))
(def my-size(atom 0))
(def root-dir(clojure.java.io/file/))

(defn scan [listing]
(doseq [f listing]
(if(.isDirectory f)
(scan(.listFiles f))
(swap!my-size#(+%(.length f)) ))))

(defn pscan [listing]
(doseq [f listing]
(if(.isDirectory f)
.listFiles f))]
(do(swap!agents#(conj%a))
(send-off a pscan)
(println(.getName f))))
(swap!my-size#(+%(.length f))))))

解决方案

>

这不是:)



试图更好地解决这个问题 。我意识到我正在做阻塞I / O操作,所以pmap不做这项工作。
我想可能给代理块分配目录(分支)来独立处理它是有意义的。看起来它确实:)
我还没有基准测试。



它的工作原理,但是,可能有一些问题的符号链接在UNIX系统。

 (def user-dir(clojure.java.io/file/ home / janko / projects /)) 
(def root-dir(clojure.java.io/file/))
(def run?(atom true))
(def * max-queue-length * 1024)
(def * max-wait-time * 1000);; wait最大1秒,然后处理任何剩余
(def * chunk-size * 64)
(def queue(java.util.concurrent.LinkedBlockingQueue。* max-queue-length *))
(def agent(atom []))
(def size-total(atom 0))
(def a(agent []))

(defn branch-节点]
(如果@run?
(doseq [f node]
(when(.isDirectory f)
(do(.put queue f)
生成器(.listFiles f)))))))

(defn producer [node]
(future
(branch-producer node)))
b $ b(defn node-consumer [node]
(if(.isFile node)
(.length node)
0))

(defn chunk- length []
(min(.size queue)* chunk-size *))

(defn compute-sizes [ f](.listFiles f))a)]
(swap!size-total#(+%(apply +(map node-consumer i)))))

消费者[]
(future
(while @run?
(when-let [size(if(zero?(chunk-length))
false
-length));适当的工作大小
(绑定[a(agent [])]
(dotimes [_ size];给我们所有目录来处理
item(.poll queue)]
(set!一个(agent(conj @a item)))))
(swap!agents#(conj%a))
(send-off a compute-sizes) * max-wait-time *)))))

>

 (producer(list user-dir))
(consumer)
pre>

对于结果类型

  @ size-total 

您可以停止它(有运行期货 - 如果我错了,请纠正我)

 (swap!run?not)

如果您发现任何错误/错误,欢迎您分享您的想法!


I'm trying to compute folder size in parallel. Maybe it's naive approach. What I do, is that I give computation of every branch node (directory) to an agent. All leaf nodes have their file sizes added to my-size. Well it doesn't work. :)

'scan' works ok, serially. 'pscan' prints only files from first level.

(def agents (atom []))
(def my-size (atom 0))
(def root-dir (clojure.java.io/file "/"))

(defn scan [listing]
  (doseq [f listing]
    (if (.isDirectory f)
      (scan (.listFiles f))
      (swap! my-size #(+ % (.length f))))))

(defn pscan [listing]
  (doseq [f listing]
    (if (.isDirectory f)
      (let [a (agent (.listFiles f))]
        (do (swap! agents #(conj % a))
            (send-off a pscan)
            (println (.getName f))))
    (swap! my-size #(+ %  (.length f))))))

Do you have any idea, what have i done wrong?

Thanks.

解决方案

So counting filesizes in parallel should be so easy?

It's not :)

I tried to solve this issue better. I realized that i'm doing blocking I/O operations so pmap doesn't do the job. I was thinking maybe giving chunks of directories (branches) to agents to process it independently would make sense. Looks it does :) Well I haven't benchmarked it yet.

It works, but, there might be some problems with symbolic links on UNIX-like systems.

(def user-dir (clojure.java.io/file "/home/janko/projects/"))
(def root-dir (clojure.java.io/file "/"))
(def run? (atom true))
(def *max-queue-length* 1024)
(def *max-wait-time* 1000)    ;; wait max 1 second then process anything left
(def *chunk-size* 64)
(def queue (java.util.concurrent.LinkedBlockingQueue. *max-queue-length* ))
(def agents (atom []))
(def size-total (atom 0))
(def a (agent []))

(defn branch-producer [node]
  (if @run?
    (doseq [f node]
      (when (.isDirectory f)
    (do (.put queue f)
        (branch-producer (.listFiles f)))))))

(defn producer [node]
  (future
    (branch-producer node)))

(defn node-consumer [node]
  (if (.isFile node)
    (.length node)
    0))

(defn chunk-length []
  (min (.size queue) *chunk-size*))

(defn compute-sizes [a]
  (doseq [i (map (fn [f] (.listFiles f)) a)]
    (swap! size-total #(+ % (apply + (map node-consumer i))))))

(defn consumer []
  (future
    (while @run?
      (when-let [size (if (zero? (chunk-length))
            false
            (chunk-length))] ;appropriate size of work
      (binding [a (agent [])]                    
        (dotimes [_ size]         ;give us all directories to process
          (when-let [item (.poll queue)]
            (set! a (agent (conj @a item)))))
        (swap! agents #(conj % a))
        (send-off a compute-sizes))
      (Thread/sleep *max-wait-time*)))))

You can start it by typing

    (producer (list user-dir))
    (consumer)

For result type

    @size-total

You can stop it by (there are running futures - correct me if I'm wrong)

    (swap! run? not)

If you find any errors/mistakes, you're welcome to share your ideas!

这篇关于计算文件夹大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆