雅典娜将特殊字符显示为? [英] Athena displays special characters as?
问题描述
我有一个外部表,其DDL以下
I have an external table with below DDL
CREATE EXTERNAL TABLE `table_1`(
`name` string COMMENT 'from deserializer',
`desc1` string COMMENT 'from deserializer',
`desc2` string COMMENT 'from deserializer',
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'='|',
'skip.header.line.count'='1')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://temp_loc/temp_csv/'
TBLPROPERTIES (
'classification'='csv',
'compressionType'='none',
'typeOfData'='file')
当尝试使用Athena呈现输出时,此表读取的csv文件采用UTF-16 LE编码,特殊字符在输出中显示为问号.有什么方法可以在Athena中设置编码或解决该问题.
The csv files that this table reads are UTF-16 LE encoded when trying to render the output using Athena the special characters are being displayed as question marks in the output. Is there any way to set encoding in Athena or to fix this.
推荐答案
正如Philipp Johannis在评论中提到的,解决方案是将 serialization.encoding
表属性设置为"UTF-16LE";.据我所见 LazySimpleSerde
使用 java.nio.charset.Charset.forName
,因此Java接受的任何编码/字符集名称都可以使用.
The solution, as Philipp Johannis mentions in a comment, is to set the serialization.encoding
table property to "UTF-16LE". As far as I can see LazySimpleSerde
uses java.nio.charset.Charset.forName
, so any encoding/charset name accepted by Java should work.
这篇关于雅典娜将特殊字符显示为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!