'InputFormat,OutputFormat'和'InputFormat'之间的区别是什么? '存储为'蜂巢? [英] What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?
问题描述
但我不明白使用'InputFormat,OutputFormat'和' '存储为'。
任何帮助都是值得赞赏的。
Hive有很多关于如何存储数据的选项。您可以使用外部存储,其中Hive将从其他位置打包一些数据,也可以从配置单元仓库中的开头创建独立表。输入和输出格式允许您指定这两种类型表的原始数据结构或数据如何物理存储。从你的客户端,你将继续使用sql来处理表,但是在低层次上它可能是文本文件或序列文件或hbase表或其他数据结构。
InputFormat和OutputFormat - 允许您描述原始数据结构,以便Hive可以将其正确映射到表视图
SerDe - 表示执行从表视图到低级别输入输出格式结构的实际转换的类,并且相反
通常你的过程是这样的: HDFS文件 - > InputFileFormat - > Deserializer - > Row对象 - > Serializer - > OutputFileFormat - > HDFS文件
$ b
存储为 - 为Hive中的新表格指定包含输入和输出格式的存储格式
这些属性可能会影响性能,整体大小,数据架构演变支持或启用ACID等功能。您可以按照本文中描述的步骤查看低级别的工作,并获取有关最常用格式的一般信息 - https://oyermolenko.blog/2017/02/16/structuring-hadoop-data-through-hive-and-sql
Im new to Bigdata and currently learning Hive. I understood the concept of InputFormat & OutputFormat in Hive as part of SerDe. I also understood that 'Stored as' is used to store a file in a particular format just like InputFormat. But I don't understand what is the significant difference between using the 'InputFormat, OutputFormat' & 'Stored as'.
Any help is appreciated.
Hive has a lot of options of how to store the data. You can either use external storage where Hive would just wrap some data from other place or you can create standalone table from start in hive warehouse. Input and Output formats allows you to specify the original data structure of these two types of tables or how the data will be physically stored. From your client side you will keep working with a table using sql, but on the low level it would be either text file or sequence file or hbase table or some other data structure.
InputFormat and OutputFormat - allows you to describe you the original data structure so that Hive could properly map it to the table view
SerDe - represents the class which performs actual translation of data from table view to the low level input-output format structures and opposite
Generally your process would be like this: HDFS files --> InputFileFormat --> Deserializer --> Row object --> Serializer --> OutputFileFormat --> HDFS files
Stored as - specifies such storage format which includes Input and Output formats for you new tables in Hive
These attributes can really affect the performance, the overall size, data schema evolution support or enable such features as ACID. You can follow the steps described in this article to see things are working on the low level and to get some general information about most commonly used formats - https://oyermolenko.blog/2017/02/16/structuring-hadoop-data-through-hive-and-sql
这篇关于'InputFormat,OutputFormat'和'InputFormat'之间的区别是什么? '存储为'蜂巢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!