Apache Spark的性能调优 [英] Apache Spark's performance tuning

查看：135 发布时间：2021/4/8 19:57:24 apache-spark

本文介绍了Apache Spark的性能调优的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在一个项目中，我必须调整spark的性能.我发现了四个最重要的参数，这些参数将有助于调整spark的性能.它们如下:

I am working on a project where in I have to tune spark's performance. I have found four most important parameters that will help in tuning spark's performance. They are as follows:

spark.memory.fraction
spark.memory.offHeap.size
spark.storage.memoryFraction
spark.shuffle.memoryFraction

我想知道我是否朝着正确的方向前进?请让我知道我是否也错过了其他一些参数.

I wanted to know whether I am going in the right direction or not? Please let me know if I missed out on some other parameters also.

谢谢.

推荐答案

诚实回答这个问题范围很广.在有关 Tuning Spark .

This is is quite broad to answer honestly. The right path to optimize performance is mainly described in the official documentation in the section concerning Tuning Spark.

通常来说，优化火花作业的因素很多:

Generally speaking, there is lots of factors to optimize spark jobs :

数据序列化
内存调整
并行度
减少任务的内存使用量
广播大变量
数据位置

它主要集中在数据序列化，内存调整以及精度/逼近技术之间的权衡，以快速完成工作.

It's mainly centralized around data serialization, memory tuning and a trade-off between precision/approximation techniques to get the job done fast.

由@ zero323提供:

Courtesy of @zero323 :

我要指出的是，问题中提到的所有选项(仅一个选项)都已弃用，并且仅在旧版模式下使用.

I'd point out, that all but one option mentioned in the question, are deprecated and used only in legacy mode.

这篇关于Apache Spark的性能调优的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark的性能调优 [英] Apache Spark's performance tuning

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark的性能调优 [英] Apache Spark&#39;s performance tuning

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Apache Spark的性能调优 [英] Apache Spark's performance tuning

登录关闭