AWS Athena 中的特殊字符显示为问号 [英] Special characters in AWS Athena show up as question marks

查看:41
本文介绍了AWS Athena 中的特殊字符显示为问号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个 csv 文件在 AWS Athena 中添加了一个表,它使用了特殊字符æøå".这些在输出中显示为 .csv 文件使用 unicode 进行编码.我也尝试将编码更改为 UTF-8,但没有成功.我已经在 S3 中上传了 csv,然后使用以下 DDL 将表添加到 Athena:

I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL:

CREATE EXTERNAL TABLE `regions_dk`(
  `postnummer` string COMMENT 'from deserializer', 
  `kommuner` string COMMENT 'from deserializer', 
  `regioner` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'separatorChar'='\;') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/path'
TBLPROPERTIES (
  'classification'='csv')

我有另一个表,其中还包含字符æøå",这是我使用 ETL 脚本添加的,这里没有问题.

I have another table which also includes the characters "æøå", which I added using an ETL script, and here there's no issue.

我忽略了什么?

推荐答案

我上传了一个ANSI编码的文件到S3,有几个无法读取的数据,我把文件的编码从PC改为UTF-8,我做到了再次执行该过程,一切正常.

I uploaded an ANSI encoded file to S3, there was several unreadable data left, I changed the encoding of the file from the PC to UTF-8, I did the process again and everything was fine.

我用 sublimetext 做的.

I did it with sublimetext.

这篇关于AWS Athena 中的特殊字符显示为问号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆