Redshift-分隔值缺少结尾引号 [英] Redshift - Delimited value missing end quote

查看:76
本文介绍了Redshift-分隔值缺少结尾引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加载CSV文件以进行红移.

Im trying to load a CSV file to redshift.

定界符'|'

CSV的第一列:

1 |Bhuvi|"This is ok"|xyz@domain.com

我使用此命令进行加载.

I used this command to load.

copy tbl from 's3://datawarehouse/source.csv'   
iam_role  'arn:aws:iam:::role/xxx'cas-pulse-redshift' 
delimiter '|' 
removequotes 
ACCEPTINVCHARS ; 

错误:

raw_field_value | This is ok" |xyz@domain.com
err_code        | 1214
err_reason      | Delimited value missing end quote

然后我也尝试了这个.

copy tbl from 's3://datawarehouse/source.csv'   
iam_role  'arn:aws:iam:::role/xxx' 
CSV QUOTE '\"' 
DELIMITER '|'   
ACCEPTINVCHARS ; 

推荐答案

免责声明-即使该帖子未回答此处提出的问题,我仍会发布此分析,以防它对某些人有所帮助.

Disclaimer - Even though this post does not answer the question asked here, I am posting this analysis in case it helps some one.

如果带引号的文本列缺少结束引号,或者如果文本列值的值本身包含换行,则可以报告错误分隔值缺少结束引号" .就我而言,文本列值中有换行符.

The error "Delimited value missing end quote" can be reported in cases where a quoted text column is missing the end quote, or if the text column value has a new line in the value itself. In my case, there was a newline in the text column value.

按照 RFC 4180 的规范,

包含换行符(CRLF),双引号和逗号的字段应该用双引号引起来.
例如:

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
For example:

    "aaa","b CRLF
    bb","ccc" CRLF
    zzz,yyy,xxx

因此,有效的CSV可以包含多行行,并且在Redshift中导入它的正确方法是指定

So a valid CSV can have multi-line rows, and the correct way to import it in Redshift is to specify the CSV format option. This also assumes that all columns having the quote character in the value will have the quote character escaped by another preceding quote character. This is also as per the CSV RFC specification.

如果使用双引号将字段括起来,则使用双引号出现在字段内的内容必须通过在前面加上来进行转义另一个双引号.
例如:

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
For example:

   "aaa","b""bb","ccc"

如果我们要导入的文件不是有效的CSV,并且视情况而定只是命名为.CSV文件,那么我们有以下选择.

If the file that we are trying to import is not a valid CSV, and is just named as a .CSV file as the case may just be, then we have the following options.

  1. 尝试在不指定CSV选项的情况下复制文件,并使用相应的复制选项微调定界符以及转义和引用行为.
  2. 如果一组选项不能一致地复制数据,请对文件进行预处理以使其一致.

通常,如果我们尝试以一致的格式导出和导入数据,则有助于使行为具有确定性.

In general, it helps to make the behaviour deterministic if we try to export and import data in formats that are consistent.

这篇关于Redshift-分隔值缺少结尾引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆