Clojure读取大文件并在其中转换数据的方式 [英] Clojure way of reading large files and transforming data therein

查看:149
本文介绍了Clojure读取大文件并在其中转换数据的方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个Subrip字幕文件,它相当大,需要一次处理一个字幕。在Java中,为了从文件中提取字幕,我将写一个具有以下签名的方法:

I am processing a Subrip subtitles file which is quite large and need to process it one subtitle at a time. In Java, to extract the subtitles from file, I would write a method with following signature:

Iterator<Subtitle> fromSubrip(final Iterator<String> lines);

使用 Iterator


  1. 该文件从未完全存储在内存中,也不是其任何转换阶段。

  2. 一个抽象,其中我可以循环在 Subtitle 对象的集合,而不会产生内存开销。

  1. The file is never in the memory in its entirety, nor is any of its transformed stage.
  2. An abstraction wherein I can loop over a collection of Subtitle objects without the memory overhead.

因为迭代器本质上是命令式和可变的,它们可能不是在Clojure中惯用的。那么什么是Clojure的方式来处理这种情况?

Since iterators are by nature imperative and mutable, they're probably not idiomatic in Clojure. So what is the Clojure way to deal with this sort of situation?

推荐答案

正如Vladimir所说,你需要处理懒惰和文件正确关闭。这是我做的,如在clojure中阅读非常大的文本文件

As Vladimir said, you need to handle the laziness and file closing correctly. Here's how I did it, as shown in "Read a very large text file into a list in clojure":

(defn lazy-file-lines 
  "open a (probably large) file and make it a available as a lazy seq of lines"
  [filename]
  (letfn [(helper [rdr]
                  (lazy-seq
                    (if-let [line (.readLine rdr)]
                      (cons line (helper rdr))
                      (do (.close rdr) nil))))]
         (helper (clojure.java.io/reader filename))))

这篇关于Clojure读取大文件并在其中转换数据的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆