SparkSession.sql和Dataset.sqlContext.sql有什么区别? [英] What's the difference between SparkSession.sql and Dataset.sqlContext.sql?

查看：207 发布时间：2020/9/4 3:37:47 apache-spark apache-spark-sql

本文介绍了SparkSession.sql和Dataset.sqlContext.sql有什么区别?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下代码片段，我想知道这两者之间有什么区别，应该使用哪一个?我正在使用Spark 2.2.

I have the following snippets of the code and I wonder what is the difference between these two and which one should I use? I am using spark 2.2.

Dataset<Row> df = sparkSession.readStream()
    .format("kafka")
    .load();

df.createOrReplaceTempView("table");
df.printSchema();

Dataset<Row> resultSet =  df.sqlContext().sql("select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
        .writeStream()
        .trigger(Trigger.ProcessingTime(1000))
        .format("console")
        .start();

Dataset<Row> df = sparkSession.readStream()
    .format("kafka")
    .load();

df.createOrReplaceTempView("table");

Dataset<Row> resultSet =  sparkSession.sql("select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
        .writeStream()
        .trigger(Trigger.ProcessingTime(1000))
        .format("console")
        .start();

推荐答案

sparkSession.sql("sql query")与df.sqlContext().sql("sql query")之间存在很小的区别.

There is a very subtle difference between sparkSession.sql("sql query") vs df.sqlContext().sql("sql query").

请注意，在单个Spark应用程序中可以有零个，两个或多个SparkSession(但是假设您在 Spark SQL 中至少(通常)只有一个SparkSession >应用程序.)

Please note that you can have zero, two or more SparkSessions in a single Spark application (but it's assumed you'll have at least and often only one SparkSession in a Spark SQL application).

请注意，Dataset绑定到在其中创建的SparkSession，并且SparkSession永远不会改变.

Please also note that a Dataset is bound to the SparkSession it was created within and the SparkSession will never change.

您可能想知道为什么有人会想要它，但是这给了您查询之间的界限，并且您可以为不同的数据集使用相同的表名，而这实际上是Spark SQL的一个非常强大的功能.

You may be wondering why anyone would want it, but that gives you boundary between queries and you could use the same table names for different datasets and that is a very powerful feature of Spark SQL actually.

下面的示例演示了它们之间的区别，并希望您能最终了解它为何功能强大的原因.

The following example shows the difference and hopefully will give you some idea why it's powerful after all.

scala> spark.version res0: String = 2.3.0-SNAPSHOT scala> :type spark org.apache.spark.sql.SparkSession scala> spark.sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> val df = spark.range(5) df: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> df.sqlContext.sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+ scala> val anotherSession = spark.newSession anotherSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@195c5803 scala> anotherSession.range(10).createOrReplaceTempView("new_table") scala> anotherSession.sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ | |new_table| true| +--------+---------+-----------+ scala> df.sqlContext.sql("show tables").show +--------+---------+-----------+ |database|tableName|isTemporary| +--------+---------+-----------+ +--------+---------+-----------+

这篇关于SparkSession.sql和Dataset.sqlContext.sql有什么区别?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SparkSession.sql和Dataset.sqlContext.sql有什么区别? [英] What's the difference between SparkSession.sql and Dataset.sqlContext.sql?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SparkSession.sql和Dataset.sqlContext.sql有什么区别? [英] What&#39;s the difference between SparkSession.sql and Dataset.sqlContext.sql?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

SparkSession.sql和Dataset.sqlContext.sql有什么区别? [英] What's the difference between SparkSession.sql and Dataset.sqlContext.sql?

登录关闭