在使用Scala解析的CSV文件中处理多余的换行符？ [英] Handling extra newlines in csv files parsed with Scala?

查看：408 发布时间：2020/10/12 21:00:10 scala csv

本文介绍了在使用Scala解析的CSV文件中处理多余的换行符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对Scala完全陌生，正在尝试解析一个CSV文件，该文件在某些单元格中（例如，双引号中）包含回车符/换行符和其他特殊字符（例如逗号）：

I'm totally new to Scala, and am trying to parse a CSV file that has carriage return/new line/and other special characters like comma in some of the cells (i.e. within double quotations), for example:

"A","B","C\n,FF\n","D"\n
"Q","W","E","R\n\n"\n
"1","2\n","2","2,2\n"\n

我要将其加载到Scala中的列表类型列表中，例如以下内容：

I want to load this into a list of lists type in Scala, like the following:

List(List("A","B","C,FF","D"),List("Q","W","E","R"),List("1","2","2","2,2"))

有什么建议可以做到吗？

Any suggestions how it can be done?

我已经找到了一些解决方案其他语言的问题。例如，这是Python中的一个很棒的工具，我很了解：在用Python解析的csv文件中处理多余的换行符（回车）？

I have found some solutions for the same problem in other languages. For example this is a great one in Python, which I understand well: Handling extra newlines (carriage returns) in csv files parsed with Python?

我的尝试：

val src2 = Source.fromFile("sourceFileName.csv")
val it =src2.getLines()
val data = for (i<-it) yield i.replace("\"","").split(",")

但是看起来所有回车符都被视为换行符。

But it looks like all carriage returns are seen as new lines.

推荐答案

在我看来，如果实际单元格包含换行符，那么在遍历 getLines 时需要保持一些状态。您可以使用 foldLeft 或类似的运算符。如果文件足够小，您还可以使用 mkString 将整个文件作为字符串存储在内存中，然后对其进行操作。日每个单元格中都用引号引起来。例如：

It seems to me that if the actual cells contain newlines, then you'll need to keep some state while traversing getLines. You can do this using a foldLeft or similar operator. If the file is small enough, you can also use mkString to get the whole file as a string in memory and then operate on that. The following simplified version assumes that every cell is surrounded by quotes. For example:

val converted = Source.fromFile(sourceFileName).mkString.replaceAll("\n", "").replaceAll("\"\"", "\"\n\"")

首先，我们要删除所有新行。然后，真正的新行将连续显示为两个引号（因为否则会出现逗号分隔引号），因此我们在引号之间添加新行。然后我们应该拥有文件的规范化版本，并且可以进行简单的操作：

First, we're removing all new lines. Then, the true new lines will manifest as two quotes in a row (since otherwise there would be a comma separating the quotes), so we add back the new lines between the quotes. Then we should have a normalized version of the file, and we can procede with simple operations:

converted.split("\n").map(_.split(",").map(_.replaceAll("\"", "")))

这篇关于在使用Scala解析的CSV文件中处理多余的换行符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在使用Scala解析的CSV文件中处理多余的换行符？ [英] Handling extra newlines in csv files parsed with Scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在使用Scala解析的CSV文件中处理多余的换行符？ [英] Handling extra newlines in csv files parsed with Scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭