“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别 [英] Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

查看:1288
本文介绍了“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行 show create table 然后执行结果 create table 语句(如果表为ORC)时发生。

使用 show create table ,你可以得到:

/ p>

 保存为输入格式
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'



<但是,如果使用这些子句创建表格,则在选择时会出现转换错误。错误喜欢:

lockquote

失败,出现异常
java.io.IOException:java.lang.ClassCastException:
org。 apache.hadoop.hive.ql.io.orc.OrcStruct不能转换为
org.apache.hadoop.io.BinaryComparable




为了解决这个问题,只需将 create table 语句更改为 STORED AS ORC


但是,正如答案在类似问题中所说:

我找不出原因。

解决方案

STORED AS 意味着3件事:


  1. SERDE

  2. INPUTFORMAT

  3. OUTPUTFORMAT

只有最后2个,让SERDE由 hive.default.serde

定义


蜂巢.default.serde

默认值:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

添加方式:Hive 0.14 with HIVE-5976

默认的SerDe Hive将使用未指定SerDe的存储格式。

当前未指定SerDe的存储格式包括'TextFile,RcFile'。




演示



hive.default.serde

  set hive.default.serde; 






  hive .default.serde = org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

STORED AS ORC

  create table mytable(i int)
存储为orc;

显示create table mytable;






注意SERDE是'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

  CREATE TABLE`mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
作为INPUTFORMAT存储
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}' ,
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')






存储为输入格式... OUTPUTFORMAT ...

  create table mytable2(i int)
保存为
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

显示create table mytable2
;






注意SERDE是'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

  CREATE TABLE` mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
存储为输入格式
' org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable2'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
' transient_lastDdlTime'='1496982426')


Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.

Using show create table, you get this:

STORED AS INPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:

Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable


To fix this, just change create table statement to STORED AS ORC

But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .

I can't figure out the reason.

解决方案

STORED AS implies 3 things:

  1. SERDE
  2. INPUTFORMAT
  3. OUTPUTFORMAT

You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde

hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.

Demo

hive.default.serde

set hive.default.serde;


hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

STORED AS ORC

create table mytable (i int) 
stored as orc;

show create table mytable;


Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

CREATE TABLE `mytable`(
  `i` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1496982059')


STORED AS INPUTFORMAT ... OUTPUTFORMAT ...

create table mytable2 (i int) 
STORED AS 
INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

show create table mytable2
;


Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

CREATE TABLE `mytable2`(
  `i` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'numFiles'='0', 
  'numRows'='0', 
  'rawDataSize'='0', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1496982426')

这篇关于“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆