在 Clojure 中逐字符处理文件 [英] Processing a file character by character in Clojure

查看:20
本文介绍了在 Clojure 中逐字符处理文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Clojure 中编写一个函数,该函数将一个字符一个字符地处理一个文件.我知道 Java 的 BufferedReader 类具有读取一个字符的 read() 方法,但我是 Clojure 的新手,不确定如何使用它.目前,我只是尝试逐行处理文件,然后打印每个字符.

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.

(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
    (let [seq (line-seq reader)]
        (doseq [item seq]
            (let [words (split item #"s")]
                (println words))))))

给定带有此文本输入的文件:

Given a file with this text input:

感谢接受国际捐赠,但我们无法进行任何有关从收到的捐款的税务处理的声明在美国以外.仅美国法律就淹没了我们的小员工.

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

我的输出如下所示:

[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States.  U.S. laws alone swamp our small staff.]

虽然我希望它看起来像:

Though I would expect it to look like:

["international" "donations" "are" .... ]

所以我的问题是,如何将上面的函数转换为逐字符读取?甚至,如何让它按我的预期工作?另外,如果您有任何改进我的 Clojure 代码的提示,我们将不胜感激.

So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.

推荐答案

(with-open [reader (clojure.java.io/reader "path/to/file")] ...

我更喜欢通过这种方式在 clojure 中获得 reader.而且,逐个字符是指在文件访问级别,例如read,它允许您控制读取多少字节?

I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?

正如@deterb 指出的,让我们检查line-seq

As @deterb pointed out, let's check the source code of line-seq

(defn line-seq
  "Returns the lines of text from rdr as a lazy sequence of strings.
   rdr must implement java.io.BufferedReader."
  {:added "1.0"
   :static true}
  [^java.io.BufferedReader rdr]
  (when-let [line (.readLine rdr)]
    (cons line (lazy-seq (line-seq rdr)))))

我伪造了一个 char-seq

 (defn char-seq 
   [^java.io.Reader rdr]
   (let [chr (.read rdr)]
     (if (>= chr 0)
     (cons chr (lazy-seq (char-seq rdr))))))

我知道这个char-seq将所有字符读入内存[1],但我认为它表明你可以直接调用.readBufferedReader 上.因此,您可以像这样编写代码:

I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:

(let [chr (.read rdr)]
  (if (>= chr 0)
    ;do your work here
  ))

你怎么看?

[1] 根据@dimagog 的评论,由于 lazy-seq

[1] According to @dimagog's comment, char-seq not read all char into memory thanks to lazy-seq

这篇关于在 Clojure 中逐字符处理文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆