保存的数据带有不希望的引号 [英] Saved data has undesired quotation marks

查看:95
本文介绍了保存的数据带有不希望的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码将数据框导出到csv:

I am using the following code to export my data frame to csv:

data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')

请注意,我使用delimiter="\t",因为我不想在每个字段周围添加其他引号.但是,当我检查输出的csv文件时,仍然有一些用引号引起来的字段.例如

Note that I use delimiter="\t", as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.

abcdABCDAAbbcd ....
1234_3456ABCD  ...
"-12345678AbCd"  ...

当字段的开头字符为-"时,似乎会出现引号.为什么会发生这种情况,有办法避免这种情况吗?谢谢!

It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!

推荐答案

您没有使用CSV编写器提供的所有选项.它具有quoteMode参数,该参数采用四个值之一(来自org.apache.commons.csv

You don't use all the options provided by the CSV writer. It has quoteMode parameter which takes one of the four values (descriptions from the org.apache.commons.csv documentation:

  • ALL-引用所有字段
  • MINIMAL(默认)-引用包含特殊字符(例如定界符,引号字符或行分隔符中的任何字符)的字段
  • NON_NUMERIC-引用所有非数字字段
  • NONE-从不引用字段
  • ALL - quotes all fields
  • MINIMAL (default) - quotes fields which contain special characters such as a delimiter, quotes character or any of the characters in line separator
  • NON_NUMERIC - quotes all non-numeric fields
  • NONE - never quotes fields

如果想避免引用最后一个选项,那是个不错的选择,不是吗?

If want to avoid quoting the last options looks a good choice, doesn't it?

这篇关于保存的数据带有不希望的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆