SparkSQL中的惰性评估 [英] Lazy Evaluation in SparkSQL

查看：162 发布时间：2020/4/30 8:20:48 apache-spark apache-spark-sql lazy-evaluation

本文介绍了SparkSQL中的惰性评估的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

# The result of loading a parquet file is also a DataFrame.
parquetFile = sqlContext.read.parquet("people.parquet")

# Parquet files can also be registered as tables and then used in SQL statements.
parquetFile.registerTempTable("parquetFile");
teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19")
teenagers.collect()

执行每一行时，Java堆中到底发生了什么(如何管理Spark内存)?

What exactly happens in the Java heap (how is the Spark memory managed) when each line is executed?

我特别有这些问题

sqlContext.read.parquet是不是很懒?是否会使整个镶木地板文件加载到内存中?
在执行collect操作时，对于要应用的SQL查询，

Is sqlContext.read.parquet lazy? Does it cause the whole parquet file to be loaded in memory?
When the collect action is executed, for the SQL query to be applied,

a.是整个实木复合地板首先存储为RDD，然后进行处理或

a. is the entire parquet first stored as an RDD and then processed or

b.是先处理实木复合地板文件以仅选择name列，然后将其存储为RDD，然后由Spark根据age条件进行过滤吗?

b. is the parquet file processed first to select only the name column, then stored as an RDD and then filtered based on the age condition by Spark?

SparkSQL中的惰性评估 [英] Lazy Evaluation in SparkSQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SparkSQL中的惰性评估 [英] Lazy Evaluation in SparkSQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭