在clojure中读取一个非常大的文本文件到列表中 [英] Read a very large text file into a list in clojure

查看:241
本文介绍了在clojure中读取一个非常大的文本文件到列表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是最好的方式来读取一个非常大的文件(如每行100 000个名称的文本文件)成一个列表(懒惰 - 根据需要加载它)在clojure?

What is the best way to read a very large file (like a text file having 100 000 names one on each line) into a list (lazily - loading it as needed) in clojure?

基本上,我需要对这些项目进行各种字符串搜索(我现在使用grep和reg ex在shell脚本中)。

Basically I need to do all sorts of string searches on these items (I do it with grep and reg ex in shell scripts now).

我尝试添加'(在开头和)结束,但显然这个方法(加载一个静态?/常数列表,由于某种原因有一个大小限制。 / p>

I tried adding '( at the beginning and ) at the end but apparently this method (loading a static?/constant list, has a size limitation for some reason.

推荐答案

您需要使用 line-seq 。来自clojuredocs的示例:

You need to use line-seq. An example from clojuredocs:

;; Count lines of a file (loses head):
user=> (with-open [rdr (clojure.java.io/reader "/etc/passwd")]
         (count (line-seq rdr)))

但是使用惰性字符串列表,如果你可以实现你的操作 filter map ,那么你可以延迟使用列表,否则最好使用嵌入式数据库。

But with a lazy list of strings, you cannot do those operations efficiently which require the whole list to be present, like sorting. If you can implement your operations as filter or map then you can consume the list lazily. Otherwise it'll be better to use an embedded database.

还要注意,你不应该坚持到列表的头部,否则整个列表将被加载到内存中。

Also note that you should not hold on to the head of the list, otherwise the whole list will be loaded in memory.

此外,如果你需要做多个操作,你需要一次又一次地读取文件。警告,懒惰可以使事情困难有时。

Furthermore, if you need to do more than one operation, you'll need to read the file again and again. Be warned, laziness can make things difficult sometimes.

这篇关于在clojure中读取一个非常大的文本文件到列表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆