在Clojure中按字符处理文件 [英] Processing a file character by character in Clojure

查看:96
本文介绍了在Clojure中按字符处理文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个函数在Clojure将处理一个文件字符。我知道Java的BufferedReader类有read()方法读取一个字符,但我是新的Clojure和不知道如何使用它。目前,我只是尝试逐行执行文件,然后打印每个字符。

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.

(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
    (let [seq (line-seq reader)]
        (doseq [item seq]
            (let [words (split item #"\s")]
                (println words))))))

给出一个包含此文本输入的文件:

Given a file with this text input:


国际捐款已被感谢,但我们不能
关于从美国以外的
收到的捐款的税务处理的任何声明。

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

我的输出如下所示:

[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States.  U.S. laws alone swamp our small staff.]

虽然我希望它看起来像:

Though I would expect it to look like:

["international" "donations" "are" .... ]

所以我的问题是,如何转换上面的函数逐个字符读取?甚至,如何使它工作,我期望呢?

So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.

推荐答案

(with-open [reader (clojure.java.io/reader "path/to/file")] ...


$ b b

我喜欢这种方式在clojure中获取阅读器,并且逐个字符意味着在文件访问级别,如 read ,这允许您控制读取多少字节

I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?

正如@deterb指出,让我们检查 line-seq

As @deterb pointed out, let's check the source code of line-seq

(defn line-seq
  "Returns the lines of text from rdr as a lazy sequence of strings.
   rdr must implement java.io.BufferedReader."
  {:added "1.0"
   :static true}
  [^java.io.BufferedReader rdr]
  (when-let [line (.readLine rdr)]
    (cons line (lazy-seq (line-seq rdr)))))

我伪造了 char-seq

 (defn char-seq 
   [^java.io.Reader rdr]
   (let [chr (.read rdr)]
     (if (>= chr 0)
     (cons chr (lazy-seq (char-seq rdr))))))

我知道这个 char-seq 读取所有字符到内存 [1],但我认为它表明你可以直接调用 .read BufferedReader 。所以,你可以这样写你的代码:

I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:

(let [chr (.read rdr)]
  (if (>= chr 0)
    ;do your work here
  ))

[1]根据@ dimagog的评论, char-seq 进入记忆感谢 lazy-seq

[1] According to @dimagog's comment, char-seq not read all char into memory thanks to lazy-seq

这篇关于在Clojure中按字符处理文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆