Clojure Leining REPL OutOfMemoryError Java堆空间 [英] Clojure Leining REPL OutOfMemoryError Java heap space

查看:324
本文介绍了Clojure Leining REPL OutOfMemoryError Java堆空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析一个相当小的(<100MB)xml文件:

I am trying to parse a fairly small (< 100MB) xml file with:

(require '[clojure.data.xml :as xml]
         '[clojure.java.io :as io])

(xml/parse (io/reader "data/small-sample.xml"))

我收到一个错误:

OutOfMemoryError Java heap space
    clojure.lang.Numbers.byte_array (Numbers.java:1216)
    clojure.tools.nrepl.bencode/read-bytes (bencode.clj:101)
    clojure.tools.nrepl.bencode/read-netstring* (bencode.clj:153)
    clojure.tools.nrepl.bencode/read-token (bencode.clj:244)
    clojure.tools.nrepl.bencode/read-bencode (bencode.clj:254)
    clojure.tools.nrepl.bencode/token-seq/fn--3178 (bencode.clj:295)
    clojure.core/repeatedly/fn--4705 (core.clj:4642)
    clojure.lang.LazySeq.sval (LazySeq.java:42)
    clojure.lang.LazySeq.seq (LazySeq.java:60)
    clojure.lang.RT.seq (RT.java:484)
    clojure.core/seq (core.clj:133)
    clojure.core/take-while/fn--4236 (core.clj:2564)



这里是我的project.clj: p>

Here is my project.clj:

(defproject dats "0.1.0-SNAPSHOT"
  ...
  :dependencies [[org.clojure/clojure "1.5.1"]
                [org.clojure/data.xml "0.0.7"]
                [criterium "0.4.1"]]
  :jvm-opts ["-Xmx1g"])



我尝试在我的.bash_profile中设置LEIN_JVM_OPTS和JVM_OPTS,但未成功。

I tried setting a LEIN_JVM_OPTS and JVM_OPTS in my .bash_profile without success.

当我尝试下面的project.clj:

When I tried the following project.clj:

(defproject barber "0.1.0-SNAPSHOT"
  ...
  :dependencies [[org.clojure/clojure "1.5.1"]
                [org.clojure/data.xml "0.0.7"]
                [criterium "0.4.1"]]
  :jvm-opts ["-Xms128m"])


b $ b

我得到以下错误:

I get the following error:

Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified
Exception in thread "Thread-5" clojure.lang.ExceptionInfo: Subprocess failed {:exit-code 1}

任何想法如何增加我的leiningen repl的堆大小?

Any idea how I could increase the heap size for my leiningen repl?

谢谢。

推荐答案

作为Read-Eval-Print-Loop的打印步骤的结果,在repl的顶层评估的任何形式被完全实现。它也存储在堆中,以便您以后可以通过* 1访问它。

Any form evaluated at the top level of the repl is realized in full, as a result of the print step of the Read-Eval-Print-Loop. It is also stored in the heap, so that you can later access it via *1.

如果存储返回值如下:

(def parsed(xml / parse(io / readerdata / small-sample.xml)))

这会立即返回,即使对于一个大小为几百兆字节的文件(我已在本地验证)。然后,您可以遍历结果,通过遍历返回的clojure.data.xml.Element树,从输入流中解析出来,这是完全实现的。

this returns immediately, even for a file hundreds of megabytes in size (I have verified this locally). You can then iterate across the result, which is realized in full as it is parsed from the input stream, by iterating over the clojure.data.xml.Element tree that is returned.

如果你不坚持元素(通过绑定它们,所以它们仍然可以访问),你可以遍历整个结构,而不需要使用比保存xml树的单个节点更多的ram。

If you do not hold on to the elements (by binding them so they are still accessible), you can iterate over the entire structure without using more ram than it takes to hold a single node of the xml tree.

user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml"))))
"Elapsed time: 0.739795 msecs"
#'user/n
user> (time (keys n))
"Elapsed time: 0.025683 msecs"
(:tag :attrs :content)
user> (time (-> n :tag))
"Elapsed time: 0.031224 msecs"
:catalog
user> (time (-> n :attrs))
"Elapsed time: 0.136522 msecs"
{}
user> (time (-> n :content first))
"Elapsed time: 0.095145 msecs"
#clojure.data.xml.Element{:tag :book, :attrs {:id "bk101"}, :content (#clojure.data.xml.Element{:tag :author, :attrs {}, :content ("Gambardella, Matthew")} #clojure.data.xml.Element{:tag :title, :attrs {}, :content ("XML Developer's Guide")} #clojure.data.xml.Element{:tag :genre, :attrs {}, :content ("Computer")} #clojure.data.xml.Element{:tag :price, :attrs {}, :content ("44.95")} #clojure.data.xml.Element{:tag :publish_date, :attrs {}, :content ("2000-10-01")} #clojure.data.xml.Element{:tag :description, :attrs {}, :content ("An in-depth look at creating applications \n      with XML.")})}
user> (time (-> n :content count))
"Elapsed time: 48178.512106 msecs"
459000
user> (time (-> n :content count))
"Elapsed time: 86.931114 msecs"
459000
;; redefining n so that we can test the performance without the pre-parsing done when we counted
user> (time (def n (xml/parse (clojure.java.io/reader "/home/justin/clojure/ok/data.xml"))))
"Elapsed time: 0.702885 msecs"
#'user/n
user> (time (doseq [el (take 100 (drop 100 (-> n :content)))] (println (:tag el))))
:book
:book
.... ;; output truncated
"Elapsed time: 26.019374 msecs"
nil
user> 

注意,这只有当我首先要求n的内容计数整个文件被解析),巨大的时间延迟发生。如果我在结构的子部分doseq,这发生非常快。

Notice that it is only when I first ask for the count of the content of n (thus forcing the whole file to be parsed) that the huge time delay occurs. If I doseq across subsections of the structure, this happens very quickly.

这篇关于Clojure Leining REPL OutOfMemoryError Java堆空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆