将一个非常大的文本文件读入 clojure 中的列表 [英] Read a very large text file into a list in clojure

查看:17
本文介绍了将一个非常大的文本文件读入 clojure 中的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 clojure 中将一个非常大的文件(例如每行一个具有 100 000 个名称的文本文件)读入列表(延迟加载 - 根据需要加载)的最佳方法是什么?

What is the best way to read a very large file (like a text file having 100 000 names one on each line) into a list (lazily - loading it as needed) in clojure?

基本上我需要对这些项目进行各种字符串搜索(我现在在 shell 脚本中使用 grep 和 reg ex 进行搜索).

Basically I need to do all sorts of string searches on these items (I do it with grep and reg ex in shell scripts now).

我尝试添加 '( 在开头和 ) 在结尾,但显然这种方法(加载静态?/常量列表,由于某种原因有大小限制.

I tried adding '( at the beginning and ) at the end but apparently this method (loading a static?/constant list, has a size limitation for some reason.

推荐答案

你需要使用 line-序列.来自 clojuredocs 的示例:

You need to use line-seq. An example from clojuredocs:

;; Count lines of a file (loses head):
user=> (with-open [rdr (clojure.java.io/reader "/etc/passwd")]
         (count (line-seq rdr)))

但是对于一个懒惰的字符串列表,你不能有效地执行那些需要整个列表存在的操作,比如排序.如果您可以将操作实现为 filtermap,那么您可以懒惰地使用列表.否则最好使用嵌入式数据库.

But with a lazy list of strings, you cannot do those operations efficiently which require the whole list to be present, like sorting. If you can implement your operations as filter or map then you can consume the list lazily. Otherwise it'll be better to use an embedded database.

另外要注意的是,你不应该抓住链表的头部,否则整个链表都会被加载到内存中.

Also note that you should not hold on to the head of the list, otherwise the whole list will be loaded in memory.

此外,如果您需要执行多个操作,则需要一次又一次地读取文件.请注意,懒惰有时会让事情变得困难.

Furthermore, if you need to do more than one operation, you'll need to read the file again and again. Be warned, laziness can make things difficult sometimes.

这篇关于将一个非常大的文本文件读入 clojure 中的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆