createOrReplaceTempView在Spark中如何工作? [英] How does createOrReplaceTempView work in Spark?
问题描述
我是Spark和Spark SQL的新手.
I am new to Spark and Spark SQL.
createOrReplaceTempView
在Spark中如何工作?
How does createOrReplaceTempView
work in Spark?
如果我们将对象的RDD
注册为表,将引发将所有数据保存在内存中吗?
If we register an RDD
of objects as a table will spark keep all the data in memory?
推荐答案
createOrReplaceTempView
创建(或替换为该视图名称已存在的话)一个懒惰求值的视图",然后可以像Spark SQL中的配置单元表一样使用它.除非您缓存支持视图的数据集,否则它不会持久保存在内存中.
createOrReplaceTempView
creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.
scala> val s = Seq(1,2,3).toDF("num")
s: org.apache.spark.sql.DataFrame = [num: int]
scala> s.createOrReplaceTempView("nums")
scala> spark.table("nums")
res22: org.apache.spark.sql.DataFrame = [num: int]
scala> spark.table("nums").cache
res23: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]
scala> spark.table("nums").count
res24: Long = 3
仅在.count
调用之后才完全缓存数据.这证明它已被缓存:
The data is cached fully only after the .count
call. Here's proof it's been cached:
相关内容: spark createOrReplaceTempView与createGlobalTempView
相关引用(与持久性表相比):与createOrReplaceTempView命令不同,saveAsTable将具体化DataFrame的内容并在Hive元存储中创建一个指向数据的指针.来自 https://spark.apache. org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables
Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." from https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables
注意:createOrReplaceTempView
以前是registerTempTable
这篇关于createOrReplaceTempView在Spark中如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!