SparkContext,JavaSparkContext,SQLContext和SparkSession之间的区别? [英] Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

查看:193
本文介绍了SparkContext,JavaSparkContext,SQLContext和SparkSession之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. SparkContext, JavaSparkContext, SQLContextSparkSession有什么区别?
  2. 是否有使用SparkSession转换或创建上下文的方法?
  3. 我可以使用一个单独的条目SparkSession完全替换所有上下文吗?
  4. SQLContextSparkContextJavaSparkContext中的所有功能也都在SparkSession中吗?
  5. 某些功能,例如parallelizeSparkContextJavaSparkContext中具有不同的行为.它们在SparkSession中的表现如何?
  6. 如何使用SparkSession创建以下内容?

  1. What is the difference between SparkContext, JavaSparkContext, SQLContext and SparkSession?
  2. Is there any method to convert or create a Context using a SparkSession?
  3. Can I completely replace all the Contexts using one single entry SparkSession?
  4. Are all the functions in SQLContext, SparkContext, and JavaSparkContext also in SparkSession?
  5. Some functions like parallelize have different behaviors in SparkContext and JavaSparkContext. How do they behave in SparkSession?
  6. How can I create the following using a SparkSession?

  • RDD
  • JavaRDD
  • JavaPairRDD
  • Dataset
  • RDD
  • JavaRDD
  • JavaPairRDD
  • Dataset

有没有一种方法可以将JavaPairRDD转换为DatasetDataset转换为JavaPairRDD?

Is there a method to transform a JavaPairRDD into a Dataset or a Dataset into a JavaPairRDD?

推荐答案

sparkContext是Scala实现的入口点,而JavaSparkContextsparkContext的Java包装器.

sparkContext is a Scala implementation entry point and JavaSparkContext is a java wrapper of sparkContext.

SQLContext是SparkSQL的入口点,可以从sparkContext接收.在2.xx之前,RDD,DataFrame和Data-set是三个不同的数据抽象.从Spark 2.xx开始,这三个数据抽象都是统一,SparkSession是Spark的统一入口点.

SQLContext is entry point of SparkSQL which can be received from sparkContext.Prior to 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.Since Spark 2.x.x, All three data abstractions are unified and SparkSession is the unified entry point of Spark.

另一个要点是,RDD表示非结构化数据,强类型数据,而DataFrames表示结构化和松散类型数据.您可以检查

An additional note is , RDD meant for unstructured data, strongly typed data and DataFrames are for structured and loosely typed data. You can check

是否有使用Sparksession转换或创建Context的方法?

是的.其sparkSession.sparkContext(),对于SQL,为sparkSession.sqlContext()

yes. its sparkSession.sparkContext() and for SQL, sparkSession.sqlContext()

我可以使用一个条目SparkSession完全替换所有Context吗?

是的.您可以从sparkSession中获取相应的顶点.

yes. you can get respective contexs from sparkSession.

是否在SparkSession中添加了SQLContext,SparkContext,JavaSparkContext等中的所有功能?

不直接.您必须获得各自的上下文并加以利用.诸如向后兼容性之类的

Not directly. you got to get respective context and make use of it.something like backward compatibility

如何在SparkSession中使用此类功能?

获取各自的上下文并加以利用.

get respective context and make use of it.

如何使用SparkSession创建以下内容?

  1. RDD可以从sparkSession.sparkContext.parallelize(???)
  2. 创建
  3. JavaRDD同样适用于此,但在Java实现中
  4. JavaPairRDD sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
  5. sparkSession返回的数据集是结构化数据.
  1. RDD can be created from sparkSession.sparkContext.parallelize(???)
  2. JavaRDD same applies with this but in java implementation
  3. JavaPairRDD sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
  4. Dataset what sparkSession returns is Dataset if it is structured data.

这篇关于SparkContext,JavaSparkContext,SQLContext和SparkSession之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆