clojure.java.jdbc/懒惰地查询大型结果集 [英] clojure.java.jdbc/query large resultset lazily

查看:113
本文介绍了clojure.java.jdbc/懒惰地查询大型结果集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从数据库中读取数百万行并将其写入文本文件.

I'm trying to read millions of rows from a database and write to a text file.

这是我的问题的继续数据库转储到带有副作用的文本文件中

我的问题现在似乎是直到程序完成才进行日志记录.我没有懒惰处理的另一个指标是,直到程序完成,才完全不编写文本文件.

My problem now seems to be that the logging doesn't happen until the program completes. Another indicator that i'm not processing lazily is that the text file isn't written at all until the program finishes.

基于IRC提示,看来我的问题可能与:result-set-fn有关,并且默认在代码的clojure.java.jdbc/query区域中为doall.

Based on an IRC tip it seems my issue is likely having to do with :result-set-fnand defaulting to doall in the clojure.java.jdbc/query area of the code.

我试图用for函数代替它,但是仍然发现内存消耗很高,因为它将整个结果集拉到内存中.

I have tried to replace this with a for function but still discover that memory consumption is high as it pulls the entire result set into memory.

我怎么有一个:result-set-fn不能像doall那样将所有东西都拉进来?如何在程序运行时逐步写入日志文件,而不是在-main执行完成后转储所有内容?

How can i have a :result-set-fn that doesn't pull everything in like doall? How can I progressively write the log file as the program is running, rather then dump everything once the -main execution is finished?

    (let [ 
          db-spec              local-postgres
          sql                  "select * from public.f_5500_sf "
          log-report-interval  1000
          fetch-size           100
          field-delim          "\t"                                                                  
          row-delim            "\n"                                                                  
          db-connection        (doto ( j/get-connection db-spec) (.setAutoCommit false)) 
          statement            (j/prepare-statement db-connection sql :fetch-size fetch-size ) 
          joiner               (fn [v] (str (join field-delim v ) row-delim ) )                      
          start                (System/currentTimeMillis)                                            
          rate-calc            (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))  
          row-count            (atom 0)                                                              
          result-set-fn        (fn [rs] (lazy-seq rs))
          lazy-results         (rest (j/query db-connection [statement] :as-arrays? true :row-fn joiner :result-set-fn result-set-fn)) 
          ]; }}}
      (.setAutoCommit db-connection false)
      (info "Started dbdump session...")    
      (with-open [^java.io.Writer wrtr (io/writer "output.txt")]
        (info "Running query...")    
        (doseq [row lazy-results] 
          (.write wrtr row)
          ))  
        (info (format "Completed write with %d rows"   @row-count))
      )

推荐答案

我通过将[org.clojure/java.jdbc "0.3.0-beta1"]放入我的project.clj依赖项列表中,获取了clojure.java.jdbc的最新修复程序.此功能增强/纠正了此处中所述的clojure.java.jdbc/query:as-arrays? true功能.

I took the recent fixes for clojure.java.jdbc by putting [org.clojure/java.jdbc "0.3.0-beta1"] in my project.clj dependencies listing. This one enhances/corrects the :as-arrays? true functionality of clojure.java.jdbc/query described here.

我认为这有所帮助,但是我仍然可以将:result-set-fn覆盖为vec.

I think this helped somewhat however I may still have been able to override the :result-set-fn to vec.

核心问题已通过将所有行逻辑塞入:row-fn而得以解决.最初的OutOfMemory问题与遍历j/query结果集有关,而不是与定义特定的:row-fn有关.

The core issue was resolved by tucking all row logic into :row-fn. The initial OutOfMemory problems had to do with iterating through j/query result sets rather than defining the specific :row-fn.

新的(有效的)代码如下:

New (working) code is below:

(defn -main []
  (let [; {{{
        db-spec              local-postgres
        source-sql           "select * from public.f_5500 "
        log-report-interval  1000
        fetch-size           1000
        row-count            (atom 0)
        field-delim          "\u0001"   ; unlikely to be in source feed,
                                        ; although i should still check in
                                        ; replace-newline below (for when "\t"
                                        ; is used especially) 
        row-delim            "\n" ; unless fixed-width, target doesn't
                                  ; support non-printable chars for recDelim like 
        db-connection        (doto ( j/get-connection db-spec) (.setAutoCommit false))
        statement            (j/prepare-statement db-connection source-sql :fetch-size fetch-size :concurrency :read-only)
        start                (System/currentTimeMillis)
        rate-calc            (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))
        replace-newline      (fn [s] (if (string? s) (clojure.string/replace  s #"\n" " ") s))
        row-fn               (fn [v] 
                               (swap! row-count inc)
                               (when (zero? (mod @row-count log-report-interval))
                                 (info (format "wrote %d rows" @row-count))
                                 (info (format "\trows/s %.2f"  (rate-calc @row-count)))
                                 (info (format "\tPercent Mem used %s "  (memory-percent-used))))
                               (str (join field-delim (doall (map #(replace-newline %) v))) row-delim ))
        ]; }}}
    (info "Started database table dump session...")
    (with-open [^java.io.Writer wrtr (io/writer "./sql/output.txt")]
      (j/query db-connection [statement] :as-arrays? true :row-fn 
               #(.write wrtr (row-fn %))))
    (info (format "\t\t\tCompleted with %d rows" @row-count))
    (info (format "\t\t\tCompleted in %s seconds" (float (/ (- (System/currentTimeMillis) start) 1000))))
    (info (format "\t\t\tAverage rows/s %.2f"  (rate-calc @row-count)))
    nil)
  )

我尝试过的其他事情(取得了有限的成功)包括音色记录和关闭标准音.我想知道是否使用REPL可能会缓存结果,然后再显示回我的编辑器(Vim壁炉),而且我不确定这是否占用了大量内存.

Other things i experimented (with limited success) involved the timbre logging and turning off stardard out; i wondered if with using a REPL it might cache the results before displaying back to my editor (vim fireplace) and i wasn't sure if that was utilizing a lot of the memory.

此外,我使用(.freeMemory (java.lang.Runtime/getRuntime))在内存附近添加了日志记录部分.我对VisualVM并没有确切地指出我的问题所在并不熟悉.

Also, I added the logging parts around memory free with (.freeMemory (java.lang.Runtime/getRuntime)). I wasn't as familiar with VisualVM and pinpointing exactly where my issue was.

我对它现在的工作方式感到满意,谢谢大家的帮助.

I am happy with how it works now, thanks everyone for your help.

这篇关于clojure.java.jdbc/懒惰地查询大型结果集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆