如何展开将 CSV 文件转换为以空格分隔的文件?标量火花 [英] How to expand converts a CSV file to a space-delimited file? Scalar spark

查看:103
本文介绍了如何展开将 CSV 文件转换为以空格分隔的文件?标量火花的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 CSV 文件.这是我的输入:

I has a CSV file. This is my Input:

,"",3,"a_b","cde 
f\gh","i j","k,""l"

现在,我想将 CSV 文件转换为以空格分隔的文件.我该怎么办?

Now, I want to convert CSV file to a space-delimited file. What should I do?

这是规格:

  1. 被识别为逗号分隔的数据包括字符串 0(未用双引号括起来)和字符串 1"(用双引号括起来).
  2. 空字符串 0 转换为 0,空字符串 1 是转换为 "_".(-z 选项更改 string0 中的 0-n 选项更改字符串 1)
  3. 中的 _
  4. 字符串 1 中转义的双引号被转换为单个 ".不能在字符串 0 中使用双引号.
  5. 任何字符串中的半角空格都被转换为"_"(-s 选项改变 _)
  6. -e 选项在 "_" 之前(或由 -s 选项指定的字符)和"\" by "\".
  7. -q 选项消除前面的 "\" 形式的 "\"""\\".
  8. \r\n 在行尾自动转换为 \n.
  9. 字符串 1 中的任何 \n 都被转换为 "\n".
  10. 最后一行不需要换行符 (\n).
  1. Data that is recognized as comma-delimited includes string 0 (not enclosed in double-quotes) and "string 1" (enclosed in double quotes).
  2. Empty string 0 is converted to 0, empty-string 1 is converted to "_". (-z option changes 0 in string0, -n option changes _ in string 1)
  3. Escaped double quotes inside string 1 are converted to single ". You cannot use double quotes in string 0.
  4. Half-width spaces inside any string are converted to "_" (-s option changes _)
  5. -e option precedes "_" (or the character specified by -s option) and "\" by "\".
  6. -q option eliminate preceding "\" form "\"" and "\\".
  7. \r\n at the end of a line is automatically converted to \n.
  8. Any \n inside string 1 is converted to "\n".
  9. The final line does not require a linefeed (\n).

我想获得如下所需的输出结果.请帮帮我.

I want to have the desired output result as below. Please help me.

0 _ 3 a\_b cde\nf\\gh i_j k,"l

推荐答案

你可以使用 itto-csv https://github.com/gekomad/itto-csv 标记 csv

You could use itto-csv https://github.com/gekomad/itto-csv to tokenize the csv

implicit val csvFormat: com.github.gekomad.ittocsv.parser.IttoCSVFormat = com.github.gekomad.ittocsv.parser.IttoCSVFormat.default
import com.github.gekomad.ittocsv.util.StringUtils._

val csvString = "1,foo"
val stringList = tokenizeCsvLine(csvString) //  Some(List("1", "foo")))

并将您的规范应用于 stringList

and apply your specifications to stringList

stringList.getOrElse(???).map(field => yourSpec(field))

这篇关于如何展开将 CSV 文件转换为以空格分隔的文件?标量火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆