HIVE:为什么Hive在表名Vs中的选择列上生成mapreduce作业,而不是从tablename中为select *生成mapreduce? [英] HIVE : Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?
问题描述
为什么Hive会在表名Vs中的选择列上生成mapreduce作业,而不是为表selectname中的select *生成mapreduce?
当解析方案 像这样的一个简单的语句被执行 select * from tablename
,hive所做的只是从存储在hdfs中的文件中获取数据,并以列式输出格式输出。基本上它会生成一个声明,如
hadoop fs -cat hdfs://schemaname/tablename.txt
hadoop fs - cat hdfs://schemaname/tablename.rc
hadoop fs -cat hdfs://schemaname/tablename.orc
或以表格文件的任何格式存储。
如果您尝试选择列或在查询中添加where子句或使用任何聚合表中,MR出现的原因很明显。
Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?
When a simple statement like this is executed select * from tablename
, what hive does is simply to fetch the data from the file stored in hdfs and bring it out in a columnar output format. Basically it generates a statement like
hadoop fs -cat hdfs://schemaname/tablename.txt
hadoop fs -cat hdfs://schemaname/tablename.rc
hadoop fs -cat hdfs://schemaname/tablename.orc
Or in whichever format your table's file is stored.
If you try selecting a column or adding a where clause to the query or using any aggregate on the table, MR comes into picture for obvious reasons.
这篇关于HIVE:为什么Hive在表名Vs中的选择列上生成mapreduce作业,而不是从tablename中为select *生成mapreduce?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!