新手在Clojure中转换CSV文件 [英] Newbie transforming CSV files in Clojure

查看:62
本文介绍了新手在Clojure中转换CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对编程是新手还是老手-大多数情况下,我只是在工作中编写了许多小型Perl脚本。 Clojure刚在我想学习Lisp时问世,所以我试图在不了解Java的情况下学习Clojure。这很艰难,但到目前为止还很有趣。

I'm both new and old to programming -- mostly I just write a lot of small Perl scripts at work. Clojure came out just when I wanted to learn Lisp, so I'm trying to learn Clojure without knowing Java either. It's tough, but it's been fun so far.

我已经看到了几个类似的问题要挖掘的例子,但是没有什么可以完全映射到我的问题空间。是否有一种规范的方法可以为Clojure中的CSV文件的每一行提取值列表?

I've seen several examples of similar problems to mine, but nothing that quite maps to my problem space. Is there a canonical way to extract lists of values for each line of a CSV file in Clojure?

这里有一些实际可用的Perl代码;对于非Perler的评论包括:

Here's some actual working Perl code; comments included for non-Perlers:

# convert_survey_to_cartography.pl
open INFILE, "< coords.csv";       # Input format "Northing,Easting,Elevation,PointID"
open OUTFILE, "> coords.txt";      # Output format "PointID X Y Z".
while (<INFILE>) {                 # Read line by line; line bound to $_ as a string.
    chomp $_;                      # Strips out each line's <CR><LF> chars.
    @fields = split /,/, $_;       # Extract the line's field values into a list.
    $y = $fields[0];               # y = Northing
    $x = $fields[1];               # x = Easting
    $z = $fields[2];               # z = Elevation
    $p = $fields[3];               # p = PointID
    print OUTFILE "$p $x $y $z\n"  # New file, changed field order, different delimiter.
}

我对Clojure有点困惑,试图将它们拼凑在一起以命令式的方式:

I've puzzled out a little bit in Clojure and tried to cobble it together in an imperative style:

; convert-survey-to-cartography.clj
(use 'clojure.contrib.duck-streams)
(let
   [infile "coords.csv" outfile "coords.txt"]
   (with-open [rdr (reader infile)]
     (def coord (line-seq rdr))
     ( ...then a miracle occurs... )
     (write-lines outfile ":x :y :z :p")))

我不希望最后一行是可以实际使用的,但这很重要。我正在寻找类似的东西:

I don't expect the last line to actually work, but it gets the point across. I'm looking for something along the lines of:

(def values (interleave (:p :y :x :z) (re-split #"," coord)))

谢谢,比尔

推荐答案

请不要使用嵌套的def。它不起作用,您认为它起作用。 def始终是全球性的!对于本地人,请使用let代替。虽然很高兴知道库函数,但是这里是一个协调功能编程的版本,尤其是Clojure。

Please don't use nested def's. It doesn't do, what you think it does. def is always global! For locals use let instead. While the library functions are nice to know, here a version orchestrating some features of functional programming in general and clojure in particular.

(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)

(defn translate-coords

可以通过(doc translation-coords)在REPL中查询文档字符串。例如,对所有核心功能都适用。因此,提供一个是一个好主意。

Docstrings can be queried in the REPL via (doc translate-coords). Works eg. for all core functions. So supplying one is a good idea.

  "Reads coordinates from infile, translates them with the given
  translator and writes the result to outfile."

translator是一个(可能是匿名的)函数,它从周围的样板中提取翻译,因此我们可以将其与不同的转换规则一起使用,这里的类型提示避免了对构造函数的反映。

translator is a (maybe anonymous) function which extracts the translation from the surrounding boilerplate. So we can reuse this functions with different transformation rules. The type hints here avoid reflection for the constructors.

  [translator #^String infile #^String outfile]

打开文件。open会很小心,保留文件的主体时文件已关闭。可以通过正常的下放底部操作,也可以通过引发的异常操作。

Open the files. with-open will take care, that the files are closed when its body is left. Be it via normal "drop off the bottom" or be it via a thrown Exception.

  (with-open [in  (BufferedReader. (FileReader. infile))
              out (FileWriter. outfile)]

我们绑定 * out * 暂时流到输出文件,因此绑定中的所有打印内容都将打印到该文件。

We bind the *out* stream temporarily to the output file. So any print inside the binding will print to the file.

    (binding [*out* out]

地图的意思是:取seq并将给定的函数应用于每个元素并返回结果的seq。#()是匿名的简写形式函数,它接受一个参数,该参数填充在中。 doseq 基本上是对输入的循环。因为我们这样做是为了避免副作用(即打印到文件),所以 doseq 是正确的构造。经验法则: map :懒惰=>表示结果, doseq :渴望=>表示副作用。

The map means: take the seq and apply the given function to every element and return the seq of the results. The #() is a short-hand notation for an anonymous function. It takes one argument, which is filled in at the %. The doseq is basically a loop over the input. Since we do that for the side effects (namely printing to a file), doseq is the right construct. Rule of thumb: map: lazy => for result, doseq: eager => for side effects.

      (doseq [coords (map #(.split % ",") (line-seq in))]

println 在该行的末尾照顾 \n interpose 接受seq并在其元素之间添加第一个参数(在我们的示例中为)。 (应用str [1 2 3])等效于(str 1 2 3),可用于构造函数动态调用。 ->> 是clojure中一个相对较新的宏,它有助于提高可读性。这意味着采用第一个参数并将其作为最后一项添加到函数调用中。给定的->> 等效于:(println(apply str(interplace(translator coords))))。 (编辑:另一个说明:由于分隔符是 \space ,我们在这里也可以写(应用println(翻译器坐标)),但是 interpose 版本也可以像使用翻译器功能一样对分隔符进行参数化,而简短版本则可以硬接线 \ \space 。)

println takes care for the \n at the end of the line. interpose takes the seq and adds the first argument (in our case " ") between its elements. (apply str [1 2 3]) is equivalent to (str 1 2 3) and is useful to construct function calls dynamically. The ->> is a relatively new macro in clojure, which helps a bit with readability. It means "take the first argument and add it as last item to the function call". The given ->> is equivalent to: (println (apply str (interpose " " (translator coords)))). ( Another note: since the separator is \space, we could here write just as well (apply println (translator coords)), but the interpose version allows to also parametrize the separator as we did with the translator function, while the short version would hardwire \space.)

        (->> (translator coords)
          (interpose " ")
          (apply str)
          println)))))

(defn survey->cartography-format
  "Translate coords in survey format to cartography format."

在这里我们使用解构(请注意,双 [[]] )。这意味着该函数的参数可以转换为seq,例如向量或列表,将第一个元素绑定到 y x 等。

Here we use destructuring (note the double [[]]). It means the argument to the function is something which can be turned into a seq, eg. a vector or a list. Bind the first element to y, the second to x and so on.

  [[y x z p]]
  [p x y z])

(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")

这里又不那么起伏了:

(import 'java.io.FileWriter 'java.io.FileReader 'java.io.BufferedReader)

(defn translate-coords
  "Reads coordinates from infile, translates them with the given
  translator and writes the result to outfile."
  [translator #^String infile #^String outfile]
  (with-open [in  (BufferedReader. (FileReader. infile))
              out (FileWriter. outfile)]
    (binding [*out* out]
      (doseq [coords (map #(.split % ",") (line-seq in))]
        (->> (translator coords)
          (interpose " ")
          (apply str)
          println)))))

(defn survey->cartography-format
  "Translate coords in survey format to cartography format."
  [[y x z p]]
  [p x y z])

(translate-coords survey->cartography-format "survey_coords.txt" "cartography_coords.txt")

希望这会有所帮助

编辑:对于CSV读取,您可能需要类似OpenCSV的东西。

For CSV reading you probably want something like OpenCSV.

这篇关于新手在Clojure中转换CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆