保存的数据带有不希望的引号 [英] Saved data has undesired quotation marks
问题描述
我正在使用以下代码将数据框导出到csv:
I am using the following code to export my data frame to csv:
data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')
请注意,我使用delimiter="\t"
,因为我不想在每个字段周围添加其他引号.但是,当我检查输出的csv文件时,仍然有一些用引号引起来的字段.例如
Note that I use delimiter="\t"
, as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.
abcdABCDAAbbcd ....
1234_3456ABCD ...
"-12345678AbCd" ...
当字段的开头字符为-"时,似乎会出现引号.为什么会发生这种情况,有办法避免这种情况吗?谢谢!
It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!
推荐答案
您没有使用CSV编写器提供的所有选项.它具有quoteMode
参数,该参数采用四个值之一(来自org.apache.commons.csv
You don't use all the options provided by the CSV writer. It has quoteMode
parameter which takes one of the four values (descriptions from the org.apache.commons.csv
documentation:
-
ALL
-引用所有字段 -
MINIMAL
(默认)-引用包含特殊字符(例如定界符,引号字符或行分隔符中的任何字符)的字段 -
NON_NUMERIC
-引用所有非数字字段 -
NONE
-从不引用字段
ALL
- quotes all fieldsMINIMAL
(default) - quotes fields which contain special characters such as a delimiter, quotes character or any of the characters in line separatorNON_NUMERIC
- quotes all non-numeric fieldsNONE
- never quotes fields
如果想避免引用最后一个选项,那是个不错的选择,不是吗?
If want to avoid quoting the last options looks a good choice, doesn't it?
这篇关于保存的数据带有不希望的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!