雅典娜将特殊字符显示为? [英] Athena displays special characters as?

查看:75
本文介绍了雅典娜将特殊字符显示为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个外部表,其DDL以下

I have an external table with below DDL

CREATE EXTERNAL TABLE `table_1`(
  `name` string COMMENT 'from deserializer', 
  `desc1` string COMMENT 'from deserializer', 
  `desc2` string COMMENT 'from deserializer', 
  )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'quoteChar'='\"', 
  'separatorChar'='|', 
  'skip.header.line.count'='1') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://temp_loc/temp_csv/'
TBLPROPERTIES (
  'classification'='csv', 
  'compressionType'='none', 
  'typeOfData'='file')

当尝试使用Athena呈现输出时,此表读取的csv文件采用UTF-16 LE编码,特殊字符在输出中显示为问号.有什么方法可以在Athena中设置编码或解决该问题.

The csv files that this table reads are UTF-16 LE encoded when trying to render the output using Athena the special characters are being displayed as question marks in the output. Is there any way to set encoding in Athena or to fix this.

推荐答案

正如Philipp Johannis在评论中提到的,解决方案是将 serialization.encoding 表属性设置为"UTF-16LE";.据我所见 LazySimpleSerde 使用 java.nio.charset.Charset.forName ,因此Java接受的任何编码/字符集名称都可以使用.

The solution, as Philipp Johannis mentions in a comment, is to set the serialization.encoding table property to "UTF-16LE". As far as I can see LazySimpleSerde uses java.nio.charset.Charset.forName, so any encoding/charset name accepted by Java should work.

这篇关于雅典娜将特殊字符显示为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆