首页
其他开发
S3→ Redshift无法处理UTF8

S3→ Redshift无法处理UTF8 [英] S3 -> Redshift cannot handle UTF8

查看：157 发布时间：2020/8/23 3:29:17 amazon-s3 amazon-redshift paraccel

本文介绍了S3→ Redshift无法处理UTF8的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们在S3中有一个文件，可通过COPY命令将其加载到Redshift中.导入失败，因为VARCHAR(20)值包含Ä，在复制命令期间该Ä会转换为..，现在对于20个字符来说太长了.

我已验证S3中的数据正确，但是COPY命令在导入期间无法理解UTF-8字符.有人找到解决方案了吗?

解决方案

tl; dr

varchar列的字节长度只需要更大即可.

详细信息

varchar数据类型支持多字节字符(UTF-8)，但是提供的长度以字节，否个字符为单位./p>

用于多字节字符加载的AWS文档错误指出以下内容:

VARCHAR列接受多字节UTF-8字符，最多四个字节.

因此，如果要允许字符Ä，则需要为该字符允许2个字节，而不是1个字节.

有关VARCHAR的AWS文档或CHARACTER VARYING 表示以下内容:

...因此VARCHAR(120)列最多包含120个单字节字符，60个二字节字符，40个三字节字符或30个四字节字符.

有关UTF-8字符及其字节长度的列表，这是一个很好的参考: 完整的UTF-8字符列表

有关解决方案

tl;dr

the byte length for your varchar column just needs to be larger.

Detail

Multi-byte characters (UTF-8) are supported in the varchar data type, however the length that is provided is in bytes, NOT characters.

AWS documentation for Multibyte Character Load Errors states the following:

VARCHAR columns accept multibyte UTF-8 characters, to a maximum of four bytes.

Therefore if you want the character Ä to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.

AWS documentation for VARCHAR or CHARACTER VARYING states the following:

... so a VARCHAR(120) column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.

For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8

Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.

这篇关于S3→ Redshift无法处理UTF8的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

登录关闭

扫码关注1秒登录

发送“验证码”获取 | 15天全站免登陆