Hive中“存储为输入格式、输出格式"和“存储为"的区别 [英] Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive
问题描述
在执行 show create table
然后执行结果 create table
语句(如果表是 ORC)时出现问题.
Issue when executing a show create table
and then executing the resulting create table
statement if the table is ORC.
使用show create table
,你会得到:
Using show create table
, you get this:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
但是如果您使用这些子句创建表,则在选择时会出现转换错误.错误喜欢:
But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:
因异常而失败java.io.IOException:java.lang.ClassCastException:org.apache.hadoop.hive.ql.io.orc.OrcStruct 不能转换为org.apache.hadoop.io.BinaryComparable
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable
要解决此问题,只需将 create table
语句更改为 STORED AS ORC
但是,正如在类似问题中的回答所说:'InputFormat、OutputFormat'有什么区别&在 Hive 中存储为"? .
我想不通原因.
To fix this, just change create table
statement to STORED AS ORC
But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .
I can't figure out the reason.
推荐答案
STORED AS
暗示 3 件事:
- SERDE
- 输入格式
- 输出格式
您只定义了最后 2 个,剩下的 SERDE 由 hive.default.serde
You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
hive.default.serde
默认值:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
加入:Hive 0.14 with HIVE-5976
默认 SerDe Hive 将用于未指定 SerDe 的存储格式.
当前未指定 SerDe 的存储格式包括TextFile、RcFile".
hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.
演示
hive.default.serde
set hive.default.serde;
hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
存储为 ORC
create table mytable (i int)
stored as orc;
show create table mytable;
注意SERDE是'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
存储为输入格式...输出格式...
create table mytable2 (i int)
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
show create table mytable2
;
注意SERDE是'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')
这篇关于Hive中“存储为输入格式、输出格式"和“存储为"的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!