阅读 csv 时,转义引号在 spark 2.2.0 中不起作用 [英] Escape quotes is not working in spark 2.2.0 while reading csv
本文介绍了阅读 csv 时,转义引号在 spark 2.2.0 中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试读取以制表符分隔但无法读取所有记录的分隔文件.
I am trying to read my delimited file which is tab separated but not able to read all records.
这是我的输入记录:
head1 head2 head3
a b c
a2 a3 a4
a1 "b1 "c1
我的代码:
var inputDf = sparkSession.read
.option("delimiter","\t")
.option("header", "true")
// .option("inferSchema", "true")
.option("nullValue", "")
.option("escape","\"")
.option("multiLine", true)
.option("nullValue", null)
.option("nullValue", "NULL")
.schema(finalSchema)
.csv("file:///C:/Users/prhasija/Desktop/retriedAddresses_4.txt")
// .csv(inputPath)
.na.fill("")
// .repartition(4)
println(inputDf.count)
输出:
2 records
为什么它不返回 3 作为计数?
Why it is not returning 3 as count?
推荐答案
我认为您需要在阅读中添加以下选项: .option("escape", "\\") 和 .option("quote","\\")
I think you need to add the following options to your read: .option("escape", "\\") and .option("quote", "\\")
val test = spark.read
.option("header", true)
.option("quote", "\\")
.option("escape", "\\")
.option("delimiter", ",")
.csv(".../test.csv")
这是我在其上使用的测试 csv:
Here is the test csv I used it on:
a,b,c
1,b,a
5,d,e
5,"a,"f
完整输出:
scala> val test = spark.read.option("header", true).option("quote", "\\").option("escape", "\\").option("delimiter", ",").csv("./test.csv")
test: org.apache.spark.sql.DataFrame = [a: string, b: string ... 1 more field]
scala> test.show
+---+---+---+
| a| b| c|
+---+---+---+
| 1| b| a|
| 5| d| e|
| 5| "a| "f|
+---+---+---+
scala> test.count
res11: Long = 3
这篇关于阅读 csv 时,转义引号在 spark 2.2.0 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文