阅读 csv 时，转义引号在 spark 2.2.0 中不起作用 [英] Escape quotes is not working in spark 2.2.0 while reading csv

查看：38 发布时间：2021/11/14 22:36:24 scala apache-spark mapreduce apache-spark-sql spark-streaming

本文介绍了阅读 csv 时，转义引号在 spark 2.2.0 中不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试读取以制表符分隔但无法读取所有记录的分隔文件.

I am trying to read my delimited file which is tab separated but not able to read all records.

这是我的输入记录:

head1   head2   head3
a   b   c
a2  a3  a4
a1  "b1 "c1

我的代码:

var inputDf = sparkSession.read
                  .option("delimiter","\t")
                  .option("header", "true")
//                  .option("inferSchema", "true")
                  .option("nullValue", "")
                  .option("escape","\"")
                  .option("multiLine", true)
                  .option("nullValue", null)
                  .option("nullValue", "NULL")
                  .schema(finalSchema)
                  .csv("file:///C:/Users/prhasija/Desktop/retriedAddresses_4.txt")
//                  .csv(inputPath)
                  .na.fill("")
//                  .repartition(4)

                  println(inputDf.count)

输出:

2 records

为什么它不返回 3 作为计数?

Why it is not returning 3 as count?

推荐答案

我认为您需要在阅读中添加以下选项: .option("escape", "\\") 和 .option("quote","\\")

I think you need to add the following options to your read: .option("escape", "\\") and .option("quote", "\\")

val test = spark.read
    .option("header", true)
    .option("quote", "\\")
    .option("escape", "\\")
    .option("delimiter", ",")
    .csv(".../test.csv")

这是我在其上使用的测试 csv:

Here is the test csv I used it on:

a,b,c
1,b,a
5,d,e
5,"a,"f

完整输出:

scala> val test = spark.read.option("header", true).option("quote", "\\").option("escape", "\\").option("delimiter", ",").csv("./test.csv")
test: org.apache.spark.sql.DataFrame = [a: string, b: string ... 1 more field]

scala> test.show
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  b|  a|
|  5|  d|  e|
|  5| "a| "f|
+---+---+---+


scala> test.count
res11: Long = 3

这篇关于阅读 csv 时，转义引号在 spark 2.2.0 中不起作用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阅读 csv 时，转义引号在 spark 2.2.0 中不起作用 [英] Escape quotes is not working in spark 2.2.0 while reading csv

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

阅读 csv 时，转义引号在 spark 2.2.0 中不起作用 [英] Escape quotes is not working in spark 2.2.0 while reading csv

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭