SQL和withColumn之间的性能 [英] Performance Between SQL and withColumn

查看：237 发布时间：2020/10/17 0:45:52 dataframe apache-spark pyspark

本文介绍了SQL和withColumn之间的性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我创建以下数据框：

Suppose I create the following dataframe:

dt = pd.DataFrame(np.array([[1,5],[2,12],[4,17]]),columns=['a','b'])
df = spark.createDataFrame(dt)

我想创建第三列c，这是这两列的总和。我有以下两种方式。

I want to create a third column, c, that is the sum of these two columns. I have the following two ways to do so.

Spark中的withColumn（）方法：

The withColumn() method in Spark:

df1 = df.withColumn('c', df.a + df.b)

或使用sql：

df.createOrReplaceTempView('mydf')
df2 = spark.sql('select *, a + b as c from mydf')

虽然两者都产生相同的结果，但哪种方法在计算上更快？

While both yield the same results, which method is computationally faster?

此外，sql与spark用户定义的函数相比如何？

Also, how does sql compare to a spark user defined function?

推荐答案

虽然两者都产生相同的结果，但哪种方法计算速度更快？

While both yield the same results, which method is computationally faster?

看看执行计划：

df1.explain()
#== Physical Plan ==
#*(1) Project [a#0L, b#1L, (a#0L + b#1L) AS c#4L]
#+- Scan ExistingRDD[a#0L,b#1L]

df2.explain()
#== Physical Plan ==
#*(1) Project [a#0L, b#1L, (a#0L + b#1L) AS c#8L]
#+- Scan ExistingRDD[a#0L,b#1L]

由于这些相同，因此两种方法相同。

Since these are the same, the two methods are identical.

通常来说，使用 withColumn 或 spark-sql没有计算优势。如果代码编写正确，则基础计算将相同。

Generally speaking, there is no computational advantage of using either withColumn or spark-sql over the other. If the code is written properly, the underlying computations will be identical.

在某些情况下，使用 spark-sql 表达某些内容会更容易，例如，如果您想使用列值作为参数一个火花函数。

There may be some cases where it's easier to express something using spark-sql, for example if you wanted to use a column value as a parameter to a spark function.

此外，sql与一个火花用户定义函数相比如何？

Also, how does sql compare to a spark user defined function?

看看这个帖子：火花功能与UDF性能？

这篇关于SQL和withColumn之间的性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SQL和withColumn之间的性能 [英] Performance Between SQL and withColumn

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SQL和withColumn之间的性能 [英] Performance Between SQL and withColumn

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭