HiveContext spark如何在内部工作? [英] How HiveContext of spark internally works?

查看:72
本文介绍了HiveContext spark如何在内部工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新手,我发现使用HiveContext可以连接到hive并运行HiveQL s.我运行它,它起作用了.

I am new to Spark.I found using HiveContext we can connect to hive and run HiveQLs. I run it and it worked.

我的疑问是Spark是否通过spark jobs进行操作,也就是说,它仅将HiveContext用于从HDFS访问相应的配置单元表文件

My doubt is whether Spark does it through spark jobs .That is, it uses HiveContext only for accessing corresponding hive table files from HDFS

内部调用蜂巢来执行查询吗?

It internally calls hive to execute the query?

推荐答案

否,Spark不会调用配置单元来执行查询. Spark仅从配置单元读取元数据,并在Spark引擎内执行查询. Spark拥有自己的SQL执行引擎,其中包括催化剂,钨等组件,以优化查询并提供更快的结果.它使用来自hive的元数据和spark的执行引擎来运行查询.

No, Spark doesn't call the hive to execute query. Spark only reads the metadata from hive and executes the query within Spark engine. Spark has it's own SQL execution engine which includes components such as catalyst, tungsten to optimize queries and give faster results. It uses meta data from hive and execution engine of spark to run the queries.

Hive的最大优点之一是它的metastore.它充当hadoop生态系统中许多组件的单个meta存储.

One of the greatest advantages of Hive is it's metastore. It acts as a single meta store for lot of components in hadoop eco system.

关于您的问题,当您使用HiveContext时,它将可以访问metastore db和您所有的Hive Meta Data,它们可以清楚地说明您拥有的数据类型,在哪里拥有数据,序列化和反序列化,压缩编解码器,列,数据类型以及有关表及其数据的字面上的每个细节.这足以让spark理解数据.

Coming to your question, when you use HiveContext, it will get access to metastore db and all your Hive Meta Data, which can clearly explain what type of data you have , where do you have the data , serialization and deserializations, compression codecs, columns, datatypes and literally every detail about the table and it's data. That is enough for spark to understand the data.

总体而言,Spark只需要元存储即可提供基础数据的完整详细信息,一旦有了元数据,它将在其执行引擎上执行您所要求的查询. Hive比Map Spark慢,因为它使用MapReduce.因此,返回到配置单元并要求在配置单元中运行是没有意义的.

Overall, Spark only needs metastore which gives complete details of underlying data and once it has the metadata, it will execute the queries that you asked for, over its on execution engine. Hive is slower than Spark as it uses MapReduce. So, there is no point in going back to hive and ask to run it in hive.

让我知道它是否回答了您的问题.

Let me know if it answers ur question.

这篇关于HiveContext spark如何在内部工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆