Hive中“存储为输入格式、输出格式"和“存储为"的区别 [英] Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

查看:66
本文介绍了Hive中“存储为输入格式、输出格式"和“存储为"的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在执行 show create table 然后执行结果 create table 语句(如果表是 ORC)时出现问题.

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.

使用show create table,你会得到:

Using show create table, you get this:

STORED AS INPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

但是如果您使用这些子句创建表,则在选择时会出现转换错误.错误喜欢:

But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:

因异常而失败java.io.IOException:java.lang.ClassCastException:org.apache.hadoop.hive.ql.io.orc.OrcStruct 不能转换为org.apache.hadoop.io.BinaryComparable

Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable


要解决此问题,只需将 create table 语句更改为 STORED AS ORC

但是,正如在类似问题中的回答所说:'InputFormat、OutputFormat'有什么区别&在 Hive 中存储为"? .

我想不通原因.


To fix this, just change create table statement to STORED AS ORC

But, as the answer said in the similar question: What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .

I can't figure out the reason.

推荐答案

STORED AS 暗示 3 件事:

  1. SERDE
  2. 输入格式
  3. 输出格式

您只定义了最后 2 个,剩下的 SERDE 由 hive.default.serde

You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde

hive.default.serde
默认值:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
加入:Hive 0.14 with HIVE-5976
默认 SerDe Hive 将用于未指定 SerDe 的存储格式.
当前未指定 SerDe 的存储格式包括TextFile、RcFile".

hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.

演示

hive.default.serde

set hive.default.serde;


hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

存储为 ORC

create table mytable (i int) 
stored as orc;

show create table mytable;


注意SERDE是'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

CREATE TABLE `mytable`(
  `i` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1496982059')


存储为输入格式...输出格式...

create table mytable2 (i int) 
STORED AS 
INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

show create table mytable2
;


注意SERDE是'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

CREATE TABLE `mytable2`(
  `i` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1496982426')

这篇关于Hive中“存储为输入格式、输出格式"和“存储为"的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆