Spark 中 UDAF 与聚合器的性能对比 [英] Performance of UDAF versus Aggregator in Spark

查看：34 发布时间：2021/11/14 22:44:01 performance apache-spark spark-dataframe aggregate-functions apache-spark-2.0

本文介绍了Spark 中 UDAF 与聚合器的性能对比的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Spark 中编写一些注重性能的代码，并想知道我是否应该编写一个 Aggregator 或用户定义的聚合函数 (UDAF) 用于我对数据帧的汇总操作.

I am trying to write some performance-mindful code in Spark and wondering whether I should write an Aggregator or a User-defined Aggregate Function (UDAF) for my rollup operations on a Dataframe.

我无法在任何地方找到任何关于这些方法有多快以及您应该在 spark 2.0+ 中使用的数据.

I have not been able to find any data anywhere on how fast each of these methods are and which you should be using for spark 2.0+.

推荐答案

你应该写一个 Aggregator 而不是 UserDefinedAggregateFunction 作为 UserDefinedAggregateFunction 对每一行执行低效的序列化/反序列化任务.将 UserDefinedAggregateFunction 重写为 Aggregator 可以将性能从 25%-30% 提高到 100 倍，正如在拉取请求中将 UserDefinedAggregateFunction 替换为 Aggregator

You should write an Aggregator rather than an UserDefinedAggregateFunction as UserDefinedAggregateFunction performs inefficient serialization/deserialization tasks for each row. Rewriting an UserDefinedAggregateFunction to an Aggregator can improve performance from 25%-30% to 100x, as stated in pull request replacing UserDefinedAggregateFunction with Aggregator

由于这些性能问题，UserDefinedAggregateFunction 类已经在 Spark 3.0 中弃用

Due to those performance issues, UserDefinedAggregateFunction class has been deprecated in Spark 3.0

这篇关于Spark 中 UDAF 与聚合器的性能对比的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 中 UDAF 与聚合器的性能对比 [英] Performance of UDAF versus Aggregator in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 中 UDAF 与聚合器的性能对比 [英] Performance of UDAF versus Aggregator in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭