如何知道哪个计数查询最快? [英] How to know which count query is the fastest?

查看：124 发布时间：2020/9/4 2:09:10 performance apache-spark query-optimization apache-spark-sql

本文介绍了如何知道哪个计数查询最快?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在最近发布的Spark SQL 2.3.0-SNAPSHOT中，我一直在探索查询优化，并且发现了语义相同的查询的不同物理计划.

I've been exploring query optimizations in the recent releases of Spark SQL 2.3.0-SNAPSHOT and noticed different physical plans for semantically-identical queries.

假设我必须计算以下数据集中的行数:

Let's assume I've got to count the number of rows in the following dataset:

val q = spark.range(1)

我可以计算如下行数:

q.count
q.collect.size
q.rdd.count
q.queryExecution.toRdd.count

q.count
q.collect.size
q.rdd.count
q.queryExecution.toRdd.count

我最初的想法是，这几乎是一个恒定的操作(肯定是由于本地数据集)，它会通过Spark SQL优化并立即给出结果，尤其是.第一个是Spark SQL完全控制查询执行的地方.

My initial thought was that it's almost a constant operation (surely due to a local dataset) that would somehow have been optimized by Spark SQL and would give a result immediately, esp. the 1st one where Spark SQL is in full control of the query execution.

看过查询的实际计划后，我相信，最有效的查询将是最后一个:

Having had a look at the physical plans of the queries led me to believe that the most effective query would be the last:

q.queryExecution.toRdd.count

原因是:

避免从其InternalRow二进制格式反序列化行
查询已进行代码生成
只有一个阶段的工作

It avoids deserializing rows from their InternalRow binary format
The query is codegened
There's only one job with a single stage

身体计划就这么简单.

我的推理正确吗?如果是这样，如果我从外部数据源(例如文件，JDBC，Kafka)读取数据集，答案会有所不同吗?

Is my reasoning correct? If so, would the answer be different if I read the dataset from an external data source (e.g. files, JDBC, Kafka)?

主要问题是要考虑一个查询是否比其他查询更有效的因素(在此示例中)?

The main question is what are the factors to take into consideration to say whether a query is more efficient than others (per this example)?

其他执行计划的完整性.

The other execution plans for completeness.

如何知道哪个计数查询最快? [英] How to know which count query is the fastest?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何知道哪个计数查询最快? [英] How to know which count query is the fastest?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭