clojure.java.jdbc/查询大结果集懒惰 [英] clojure.java.jdbc/query large resultset lazily

查看:17
本文介绍了clojure.java.jdbc/查询大结果集懒惰的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从数据库中读取数百万行并写入文本文件.

I'm trying to read millions of rows from a database and write to a text file.

这是我问题的延续 数据库转储到具有副作用的文本文件

我现在的问题似乎是在程序完成之前不会进行日志记录.我没有懒惰处理的另一个指标是在程序完成之前根本不会写入文本文件.

My problem now seems to be that the logging doesn't happen until the program completes. Another indicator that i'm not processing lazily is that the text file isn't written at all until the program finishes.

根据 IRC 提示,我的问题似乎与 :result-set-fnclojure.java 中的默认 doall 有关.jdbc/query 代码区域.

Based on an IRC tip it seems my issue is likely having to do with :result-set-fnand defaulting to doall in the clojure.java.jdbc/query area of the code.

我尝试用 for 函数替换它,但仍然发现内存消耗很高,因为它将整个结果集拉入内存.

I have tried to replace this with a for function but still discover that memory consumption is high as it pulls the entire result set into memory.

我怎么能有一个 :result-set-fn 不像 doall 那样把所有东西都放进去?如何在程序运行时逐步写入日志文件,而不是在 -main 执行完成后转储所有内容?

How can i have a :result-set-fn that doesn't pull everything in like doall? How can I progressively write the log file as the program is running, rather then dump everything once the -main execution is finished?

    (let [ 
          db-spec              local-postgres
          sql                  "select * from public.f_5500_sf "
          log-report-interval  1000
          fetch-size           100
          field-delim          "	"                                                                  
          row-delim            "
"                                                                  
          db-connection        (doto ( j/get-connection db-spec) (.setAutoCommit false)) 
          statement            (j/prepare-statement db-connection sql :fetch-size fetch-size ) 
          joiner               (fn [v] (str (join field-delim v ) row-delim ) )                      
          start                (System/currentTimeMillis)                                            
          rate-calc            (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))  
          row-count            (atom 0)                                                              
          result-set-fn        (fn [rs] (lazy-seq rs))
          lazy-results         (rest (j/query db-connection [statement] :as-arrays? true :row-fn joiner :result-set-fn result-set-fn)) 
          ]; }}}
      (.setAutoCommit db-connection false)
      (info "Started dbdump session...")    
      (with-open [^java.io.Writer wrtr (io/writer "output.txt")]
        (info "Running query...")    
        (doseq [row lazy-results] 
          (.write wrtr row)
          ))  
        (info (format "Completed write with %d rows"   @row-count))
      )

推荐答案

我通过将 [org.clojure/java.jdbc "0.3.0-beta1"] 在我的 project.clj 依赖项列表中.这个增强/纠正了 :as-arrays?clojure.java.jdbc/query 的真实 功能描述 这里.

I took the recent fixes for clojure.java.jdbc by putting [org.clojure/java.jdbc "0.3.0-beta1"] in my project.clj dependencies listing. This one enhances/corrects the :as-arrays? true functionality of clojure.java.jdbc/query described here.

我认为这有所帮助,但是我可能仍然能够将 :result-set-fn 覆盖为 vec.

I think this helped somewhat however I may still have been able to override the :result-set-fn to vec.

通过将所有行逻辑放入 :row-fn 解决了核心问题.最初的 OutOfMemory 问题与遍历 j/query 结果集有关,而不是定义特定的 :row-fn.

The core issue was resolved by tucking all row logic into :row-fn. The initial OutOfMemory problems had to do with iterating through j/query result sets rather than defining the specific :row-fn.

新的(工作)代码如下:

New (working) code is below:

(defn -main []
  (let [; {{{
        db-spec              local-postgres
        source-sql           "select * from public.f_5500 "
        log-report-interval  1000
        fetch-size           1000
        row-count            (atom 0)
        field-delim          "u0001"   ; unlikely to be in source feed,
                                        ; although i should still check in
                                        ; replace-newline below (for when "	"
                                        ; is used especially) 
        row-delim            "
" ; unless fixed-width, target doesn't
                                  ; support non-printable chars for recDelim like 
        db-connection        (doto ( j/get-connection db-spec) (.setAutoCommit false))
        statement            (j/prepare-statement db-connection source-sql :fetch-size fetch-size :concurrency :read-only)
        start                (System/currentTimeMillis)
        rate-calc            (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))
        replace-newline      (fn [s] (if (string? s) (clojure.string/replace  s #"
" " ") s))
        row-fn               (fn [v] 
                               (swap! row-count inc)
                               (when (zero? (mod @row-count log-report-interval))
                                 (info (format "wrote %d rows" @row-count))
                                 (info (format "	rows/s %.2f"  (rate-calc @row-count)))
                                 (info (format "	Percent Mem used %s "  (memory-percent-used))))
                               (str (join field-delim (doall (map #(replace-newline %) v))) row-delim ))
        ]; }}}
    (info "Started database table dump session...")
    (with-open [^java.io.Writer wrtr (io/writer "./sql/output.txt")]
      (j/query db-connection [statement] :as-arrays? true :row-fn 
               #(.write wrtr (row-fn %))))
    (info (format "			Completed with %d rows" @row-count))
    (info (format "			Completed in %s seconds" (float (/ (- (System/currentTimeMillis) start) 1000))))
    (info (format "			Average rows/s %.2f"  (rate-calc @row-count)))
    nil)
  )

我尝试的其他事情(成功有限)包括音色记录和关闭标准输出;我想知道使用 REPL 是否可以在显示回我的编辑器(vim 壁炉)之前缓存结果,我不确定这是否占用了大量内存.

Other things i experimented (with limited success) involved the timbre logging and turning off stardard out; i wondered if with using a REPL it might cache the results before displaying back to my editor (vim fireplace) and i wasn't sure if that was utilizing a lot of the memory.

此外,我使用 (.freeMemory (java.lang.Runtime/getRuntime)) 添加了有关内存空闲的日志记录部分.我对 VisualVM 不太熟悉,无法准确指出我的问题出在哪里.

Also, I added the logging parts around memory free with (.freeMemory (java.lang.Runtime/getRuntime)). I wasn't as familiar with VisualVM and pinpointing exactly where my issue was.

我对它现在的工作方式很满意,感谢大家的帮助.

I am happy with how it works now, thanks everyone for your help.

这篇关于clojure.java.jdbc/查询大结果集懒惰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆