如何展开将CSV文件转换为以空格分隔的文件?标量火花 [英] How to expand converts a CSV file to a space-delimited file? Scalar spark

查看:253
本文介绍了如何展开将CSV文件转换为以空格分隔的文件?标量火花的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 CSV 文件.这是我的输入内容:

I has a CSV file. This is my Input:

,"",3,"a_b","cde 
f\gh","i j","k,""l"

现在,我想将 CSV 文件转换为以空格分隔的文件.我该怎么办?

Now, I want to convert CSV file to a space-delimited file. What should I do?

这是规格:

  1. 被识别为逗号分隔的数据包括字符串0(不包含在双引号中)和字符串1"(包含在双引号中).
  2. 空字符串0转换为0,空字符串1转换为转换为"_" .( -z 选项更改string0中的 0 -n 选项更改字符串1)中的 _
  3. 字符串1中的转义双引号将转换为单".您不能在字符串0中使用双引号.
  4. 任何字符串内的半角空格都将转换为"_" ( -s 选项更改 _ )
  5. -e 选项在"_" (或 -s 选项指定的字符)之前"\" 通过"\" .
  6. -q 选项消除了前面的"\" 形式的"\""" \\".
  7. 行尾的
  8. \ r \ n 自动转换为 \ n .
  9. 字符串1内的任何 \ n 都将转换为"\ n" .
  10. 最后一行不需要换行( \ n ).
  1. Data that is recognized as comma-delimited includes string 0 (not enclosed in double-quotes) and "string 1" (enclosed in double quotes).
  2. Empty string 0 is converted to 0, empty-string 1 is converted to "_". (-z option changes 0 in string0, -n option changes _ in string 1)
  3. Escaped double quotes inside string 1 are converted to single ". You cannot use double quotes in string 0.
  4. Half-width spaces inside any string are converted to "_" (-s option changes _)
  5. -e option precedes "_" (or the character specified by -s option) and "\" by "\".
  6. -q option eliminate preceding "\" form "\"" and "\\".
  7. \r\n at the end of a line is automatically converted to \n.
  8. Any \n inside string 1 is converted to "\n".
  9. The final line does not require a linefeed (\n).

我想要具有以下期望的输出结果.请帮助我.

I want to have the desired output result as below. Please help me.

0 _ 3 a\_b cde\nf\\gh i_j k,"l

推荐答案

您可以使用itto-csv https://github.com/gekomad/itto-csv 标记化csv

You could use itto-csv https://github.com/gekomad/itto-csv to tokenize the csv

implicit val csvFormat: com.github.gekomad.ittocsv.parser.IttoCSVFormat = com.github.gekomad.ittocsv.parser.IttoCSVFormat.default
import com.github.gekomad.ittocsv.util.StringUtils._

val csvString = "1,foo"
val stringList = tokenizeCsvLine(csvString) //  Some(List("1", "foo")))

并将您的规范应用于stringList

and apply your specifications to stringList

stringList.getOrElse(???).map(field => yourSpec(field))

这篇关于如何展开将CSV文件转换为以空格分隔的文件?标量火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆