如何在Scala中写入csv文件? [英] How to write to a csv file in scala?

查看:142
本文介绍了如何在Scala中写入csv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将数据写入csv文件,我创建了四列,

  val csvFields = Array(序列号",记录类型",第一个文件值,第二个文件值")', 

除序列号外,其他三个字段均为列表

  Second_file_value =列表("B","gjgbn","fgbhjf","dfjf") 

First_File_Value =列表("A","abhc","agch","mknk")

Record_type = List('1','2',3','4');

  val outputFile = new BufferedWriter(new FileWriter("Resulet.csv")val csvWriter =新的CSVWriter(outputFile)val listOfRecords = new ListBuffer [Array [String]]()listOfRecords:+ csvFields 

我正在使用此循环写入列

-1至30)的

  {listOfRecords + = Array(i.toString,Record_type,First_File_Value,Second_file_value)}csvWriter.writeAll(listOfRecords.toList)output.close() 

我面临的问题是csv文件中填充了30行相同的值(第一行值),列表中的值没有被迭代.

任何参考也将有帮助

解决方案

没有完整的示例(如在编译的 Main 文件中),不能说为什么得到相同的行一遍又一遍.您发布的摘录正确无误.

  scala>val lb:ListBuffer [Array [String]] =新的ListBuffer [Array [String]]()lb:scala.collection.mutable.ListBuffer [Array [String]] = ListBuffer()斯卡拉>为(i<-1至30){lb + = Array(i.toString)}斯卡拉>磅到清单res5:List [Array [String]] = List(Array(1),Array(2),Array(3),Array(4),Array(5),Array(6),Array(7),Array(8),Array(9),Array(10),Array(11),Array(12),Array(13),Array(14),Array(15),Array(16),Array(17),Array(18)),Array(19),Array(20),Array(21),Array(22),Array(23),Array(24),Array(25),Array(26),Array(27),Array(28)),数组(29)) 

但是,通常有多种方法可以更好地做到这一点,这些方法可以帮助您避免此错误和其他错误.

向所有行添加串行前缀

在Scala中,通常认为与可变结构相比,将可变结构作为成语是更好的选择.鉴于此,我建议您构造一个函数,以使用不可变方法将串行前缀添加到行中.有多种方法可以执行此操作,但是最基本的方法是 fold 操作.如果您不熟悉它,可以将 fold 视为对结构的一种转换,例如for循环的功能版本.

请牢记这一点,这是您可能需要处理的一些行,它们是 List [List [String]] 并为所有行添加数字前缀.

  def addPrefix(lls:List [List [String]]):List [List [String]] =lls.foldLeft((1,List.empty [List [String]])){//您无需在此处注释类型,为了清楚起见,我只是这样做了.case((serial:Int,acc:List [List [String]]),值:List [String])=>(序列+ 1,(serial.toString +:值)+:acc)} ._ 2.反向 

一个 foldLeft 以我们想要的相反的顺序建立了列表,这就是为什么我在末尾调用 .reverse 的原因.这样做的原因是关于堆栈在遍历结构时如何工作的人工产物,超出了此问题的范围,但是关于为什么使用 foldLeft foldRight .

根据我上面的读物,这就是示例中的行.

  val columnOne:List [String] =清单('1','2','3','4').map(_.toString)val columnTwo:列表[String] =列表("A","abhc","agch","mknk")val columnThree:List [String] =列表("B","gjgbn","fgbhjf","dfjf")val行:List [List [String]] =columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty [List [String]]){case(acc,(a,(b,c)))=>列表(a,b,c)+:acc}.撤销 

哪个产生了这个.

  scala>rows.foreach(println)清单(1,A,B)清单(2,abhc,gjgbn)清单(3,agch,fgbhjf)清单(4,mknk,dfjf) 

让我们尝试以该函数作为输入来调用我们的函数.

  scala>addPrefix(rows).foreach(println)清单(1、1,A,B)清单(2,2,abhc,gjgbn)清单(3、3,agch,fgbhjf)清单(4,4,mknk,dfjf) 

好的,看起来不错.

编写CSV文件

现在要写入CSV文件.因为 CSVWriter 在Java集合类型方面起作用,所以我们需要将Scala类型转换为Java集合.在Scala中,您应该在最后可能的时刻进行此操作.这样做的原因是Scala的类型设计为可以与Scala很好地配合使用,我们不希望早日失去这种能力.就不变性而言,它们也比并行Java类型更安全(如果您使用的是不变的变体,则本示例将这样做).

让我们定义一个函数 writeCsvFile ,该函数接受文件名,标题行和行列表并将其写出.同样,有很多方法可以正确执行此操作,但这是一个简单的示例.

  def writeCsvFile(fileName:字符串,标头:List [String],行:列表[列表[字符串]]):Try [Unit] =尝试(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter:CSVWriter)=>尝试{csvWriter.writeAll((标题+:行).map(_.toArray).asJava)csvWriter.close()} 比赛 {情况f @ Failure(_)=>//始终返回原始失败.在生产代码中,我们可能//定义一个新的异常,该异常将两种情况都包装在一起//它们都失败,但是这里省略了.尝试(csvWriter.close()).recoverWith {情况_ =>F}成功案例=>成功}) 

让我们分解一下.我正在使用 scala.util 包中的 Try 数据类型.它类似于语言级别的 try/catch/finally 块,但是它不是使用特殊的结构来捕获异常,而是使用常规值.这是Scala中的另一个常见习语,相对于特殊的语言控制流构造,更喜欢纯语言值.

让我们仔细看一下这个表达式(header +:rows).map(_.toArray).asJava .这个小表达式正在做很多操作.首先,将 header 行添加到(header +:行)行列表的前面.然后,由于 CSVWriter 想要一个 Iterable< Array< String>> ,我们 first 将内部类型转换为 Array 然后将外部类型更改为 Iterable .外部类型转换是 .asJava 调用,您可以通过导入 scala.collection.JavaConverters ._ 来获得它,它在Scala和Java类型之间进行隐式转换.

该函数的其余部分非常简单.我们将行写出,然后检查是否失败.如果有,我们确保仍然尝试关闭 CSVWriter .

完整编译示例

我在此处提供了完整的编译示例.

  import com.opencsv._导入java.io._导入scala.collection.JavaConverters._导入scala.util._对象Main {val标头:List [String] =列表(序列号",记录类型",第一个文件值",第二个文件值")val columnOne:List [String] =清单('1','2','3','4').map(_.toString)val columnTwo:列表[String] =列表("A","abhc","agch","mknk")val columnThree:List [String] =列表("B","gjgbn","fgbhjf","dfjf")val行:List [List [String]] =columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty [List [String]]){case(acc,(a,(b,c)))=>列表(a,b,c)+:acc}.撤销def addPrefix(lls:List [List [String]]):List [List [String]] =lls.foldLeft((1,List.empty [List [String]])){case((serial:Int,acc:List [List [String]]),值:List [String])=>(序列+ 1,(serial.toString +:值)+:acc)} ._ 2.反向def writeCsvFile(fileName:字符串,标头:List [String],行:列表[列表[字符串]]):Try [Unit] =尝试(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter:CSVWriter)=>尝试{csvWriter.writeAll((标题+:行).map(_.toArray).asJava)csvWriter.close()} 比赛 {情况f @ Failure(_)=>//始终返回原始失败.在生产代码中,我们可能//定义一个新的异常,该异常将两种情况都包装在一起//它们都失败,但是这里省略了.尝试(csvWriter.close()).recoverWith {情况_ =>F}成功案例=>成功})def main(args:Array [String]):Unit = {println(writeCsvFile("/tmp/test.csv",标头,addPrefix(rows)))}} 

这是运行该程序后文件的内容.

 序列号",记录类型",第一个文件值",第二个文件值""1","1","A","B""2","2","abhc","gjgbn""3","3","agch","fgbhjf""4","4","mknk","dfjf" 

最后的笔记

过时的图书馆

我在原始帖子的评论中注意到您正在使用"au.com.bytecode"%"opencsv"%"2.4" .我一般都不熟悉 opencsv 库,但是根据Maven Central的说法,这似乎是主存储库的一个非常古老的分支.我建议您使用主存储库. https://search.maven.org/search?q=opencsv

性能

人们经常担心,当使用不可变的数据结构和技术时,我们需要权衡性能.可以是这种情况,但是通常渐进复杂度不变.上面的解决方案是 O(n),其中 n 是行数.它具有比可变解决方案更高的常数,但通常并不重要.如果可以的话,可以采用一些技术,例如 addPrefix 中更明确的递归可以减轻这种情况.但是,除非真正需要,否则就永远不要这样进行优化,因为这样会使代码更容易出错,习惯性也较差(因此可读性更差).

I am trying to write data to a csv file, I have four columns which I have created as

val csvFields = Array("Serial Number", "Record Type", First File value", Second file value") ', 

other than serial number other three fields are lists

Second_file_value = List ("B", "gjgbn", "fgbhjf", "dfjf")

First_File_Value = List ("A","abhc","agch","mknk")

Record_type = List('1','2',3','4');

 val outputFile = new BufferedWriter(new FileWriter("Resulet.csv")
 val csvWriter = new CSVWriter(outputFile)
 val listOfRecords = new ListBuffer[Array[String]]()
 listOfRecords :+ csvFields

I am using this loop for writing into columns

for ( i <- 1 until 30){
listOfRecords += Array(i.toString, Record_type , First_File_Value , Second_file_value )}
csvWriter.writeAll(listOfRecords.toList)
output.close()

The problem I am facing is the csv file is filled with 30 rows of same values(1st row value), the values in the lists are not getting iterated.

Any references will also be helpful

解决方案

Without a complete example (as in a compiling Main file), it can't be said why you are getting the same row over and over. The snippet you posted is correct in isolation.

scala> val lb: ListBuffer[Array[String]] = new ListBuffer[Array[String]]()
lb: scala.collection.mutable.ListBuffer[Array[String]] = ListBuffer()

scala> for (i <- 1 until 30){lb += Array(i.toString)}

scala> lb.toList
res5: List[Array[String]] = List(Array(1), Array(2), Array(3), Array(4), Array(5), Array(6), Array(7), Array(8), Array(9), Array(10), Array(11), Array(12), Array(13), Array(14), Array(15), Array(16), Array(17), Array(18), Array(19), Array(20), Array(21), Array(22), Array(23), Array(24), Array(25), Array(26), Array(27), Array(28), Array(29))

However, there are a number of ways you can do this better in general that might help you avoid this and other bugs.

Adding A Serial Prefix To All Rows

In Scala it is general considered better to prefer immutable structures over mutable ones as an idiom. Given that, I'd suggest you construct a function to add the serial prefix to your rows using an immutable method. There are a number of ways to do this, but the most fundamental one is a fold operation. If you are not familiar with it, a fold can be thought of as a transformation over a structure, like the functional version of a for loop.

With that in mind, here is how you might take some rows, which are a List[List[String]] and add a numeric prefix to all of them.

def addPrefix(lls: List[List[String]]): List[List[String]] =
  lls.foldLeft((1, List.empty[List[String]])){
    // You don't need to annotate the types here, I just did that for clarity.
    case ((serial: Int, acc: List[List[String]]), value: List[String]) =>
      (serial + 1, (serial.toString +: value) +: acc)
  }._2.reverse

A foldLeft builds up the list in the reverse of what we want, which is why I call .reverse at the end. The reason for this is an artifact of how the stacks work when traversing structures and is beyond the scope of this question, but there are many good articles on why to use foldLeft or foldRight.

From what I read above, this is what your rows look like in the example.

val columnOne: List[String] =
  List('1','2','3','4').map(_.toString)
val columnTwo: List[String] =
  List("A","abhc","agch","mknk")
val columnThree: List[String] =
  List("B", "gjgbn", "fgbhjf", "dfjf")

val rows: List[List[String]] =
  columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty[List[String]]){
    case (acc, (a, (b, c))) => List(a, b, c) +: acc
  }.reverse

Which yields this.

scala> rows.foreach(println)
List(1, A, B)
List(2, abhc, gjgbn)
List(3, agch, fgbhjf)
List(4, mknk, dfjf)

Let's try calling our function with that as the input.

scala> addPrefix(rows).foreach(println)
List(1, 1, A, B)
List(2, 2, abhc, gjgbn)
List(3, 3, agch, fgbhjf)
List(4, 4, mknk, dfjf)

Okay, that looks good.

Writing The CSV File

Now to write the CSV file. Because CSVWriter works in terms of Java collection types, we need to convert our Scala types to Java collections. In Scala you should do this at the last possible moment. The reason for this is that Scala's types are designed to work well with Scala and we don't want to lose that ability early. They are also safer than the parallel Java types in terms of immutability (if you are using the immutable variants, which this example does).

Let's define a function writeCsvFile that takes a filename, a header row, and a list of rows and writes it out. Again there are many ways to do this correctly, but here is a simple example.

def writeCsvFile(
  fileName: String,
  header: List[String],
  rows: List[List[String]]
): Try[Unit] =
  Try(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter: CSVWriter) =>
    Try{
      csvWriter.writeAll(
        (header +: rows).map(_.toArray).asJava
      )
      csvWriter.close()
    } match {
      case f @ Failure(_) =>
        // Always return the original failure.  In production code we might
        // define a new exception which wraps both exceptions in the case
        // they both fail, but that is omitted here.
        Try(csvWriter.close()).recoverWith{
          case _ => f
        }
      case success =>
        success
    }
  )

Let's break that down for a moment. I am using the Try data type from the scala.util package. It is similar to the language level try/catch/finally blocks, but rather than using a special construct to catch exceptions, it uses a normal value. This is another common idiom in Scala, prefer plain language values over special language control flow constructs.

Let's take a closer look at this expression (header +: rows).map(_.toArray).asJava. This small expression is doing quite a few operations. First, we add our header row into the front of our list of rows (header +: rows). Then, since the CSVWriter wants an Iterable<Array<String>> we first convert the inner type to Array then the outer type to Iterable. The .asJava call is what does the outer type conversion and you get it by importing scala.collection.JavaConverters._ which has implicit conversions between Scala and Java types.

The rest of the function is pretty straight forward. We write the rows out, then check if there was a failure. If there was, we ensure that we still attempt to close the CSVWriter.

Full Compiling Example

I've included a full compiling example here.

import com.opencsv._
import java.io._
import scala.collection.JavaConverters._
import scala.util._

object Main {

  val header: List[String] =
    List("Serial Number", "Record Type", "First File value", "Second file value")

  val columnOne: List[String] =
    List('1','2','3','4').map(_.toString)
  val columnTwo: List[String] =
    List("A","abhc","agch","mknk")
  val columnThree: List[String] =
    List("B", "gjgbn", "fgbhjf", "dfjf")

  val rows: List[List[String]] =
    columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty[List[String]]){
      case (acc, (a, (b, c))) => List(a, b, c) +: acc
    }.reverse

  def addPrefix(lls: List[List[String]]): List[List[String]] =
    lls.foldLeft((1, List.empty[List[String]])){
      case ((serial: Int, acc: List[List[String]]), value: List[String]) =>
        (serial + 1, (serial.toString +: value) +: acc)
    }._2.reverse

  def writeCsvFile(
    fileName: String,
    header: List[String],
    rows: List[List[String]]
  ): Try[Unit] =
    Try(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter: CSVWriter) =>
      Try{
        csvWriter.writeAll(
          (header +: rows).map(_.toArray).asJava
        )
        csvWriter.close()
      } match {
        case f @ Failure(_) =>
          // Always return the original failure.  In production code we might
          // define a new exception which wraps both exceptions in the case
          // they both fail, but that is omitted here.
          Try(csvWriter.close()).recoverWith{
            case _ => f
          }
        case success =>
          success
      }
    )

  def main(args: Array[String]): Unit = {
    println(writeCsvFile("/tmp/test.csv", header, addPrefix(rows)))
  }
}

Here is the contents of the file after running that program.

"Serial Number","Record Type","First File value","Second file value"
"1","1","A","B"
"2","2","abhc","gjgbn"
"3","3","agch","fgbhjf"
"4","4","mknk","dfjf"

Final Notes

Outdated Library

I noticed in the comments on the original post that you were using "au.com.bytecode" % "opencsv" % "2.4". I'm not familiar with the opencsv library in general, but according to Maven Central that appears to be a very old fork of the primary repo. I'd suggest you use the primary repo. https://search.maven.org/search?q=opencsv

Performance

People often get concerned that when using immutable data structures and techniques that we are required to make a performance trade off. This can be the case, but usually the asymptotic complexity is unchanged. The above solution is O(n) where n is the number of rows. It has a higher constant than a mutable solution, but generally that is not significant. If it were, there are techniques that could be employed, such as more explicit recursion in addPrefix that would mitigate this. However, you should never optimize like that unless you really need to, as it makes the code more error prone and less idiomatic (and thus less readable).

这篇关于如何在Scala中写入csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆