Clojure读取大文件并在其中转换数据的方式 [英] Clojure way of reading large files and transforming data therein
问题描述
我正在处理一个Subrip字幕文件,它相当大,需要一次处理一个字幕。在Java中,为了从文件中提取字幕,我将写一个具有以下签名的方法:
I am processing a Subrip subtitles file which is quite large and need to process it one subtitle at a time. In Java, to extract the subtitles from file, I would write a method with following signature:
Iterator<Subtitle> fromSubrip(final Iterator<String> lines);
使用 Iterator
:
- 该文件从未完全存储在内存中,也不是其任何转换阶段。
- 一个抽象,其中我可以循环在
Subtitle
对象的集合,而不会产生内存开销。
- The file is never in the memory in its entirety, nor is any of its transformed stage.
- An abstraction wherein I can loop over a collection of
Subtitle
objects without the memory overhead.
因为迭代器本质上是命令式和可变的,它们可能不是在Clojure中惯用的。那么什么是Clojure的方式来处理这种情况?
Since iterators are by nature imperative and mutable, they're probably not idiomatic in Clojure. So what is the Clojure way to deal with this sort of situation?
推荐答案
正如Vladimir所说,你需要处理懒惰和文件正确关闭。这是我做的,如在clojure中阅读非常大的文本文件:
As Vladimir said, you need to handle the laziness and file closing correctly. Here's how I did it, as shown in "Read a very large text file into a list in clojure":
(defn lazy-file-lines
"open a (probably large) file and make it a available as a lazy seq of lines"
[filename]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) nil))))]
(helper (clojure.java.io/reader filename))))
这篇关于Clojure读取大文件并在其中转换数据的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!