Apache Spark的性能调优 [英] Apache Spark's performance tuning

查看:135
本文介绍了Apache Spark的性能调优的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个项目中,我必须调整spark的性能.我发现了四个最重要的参数,这些参数将有助于调整spark的性能.它们如下:

I am working on a project where in I have to tune spark's performance. I have found four most important parameters that will help in tuning spark's performance. They are as follows:

  1. spark.memory.fraction
  2. spark.memory.offHeap.size
  3. spark.storage.memoryFraction
  4. spark.shuffle.memoryFraction

我想知道我是否朝着正确的方向前进?请让我知道我是否也错过了其他一些参数.

I wanted to know whether I am going in the right direction or not? Please let me know if I missed out on some other parameters also.

谢谢.

推荐答案

诚实回答这个问题范围很广.在有关 Tuning Spark .

This is is quite broad to answer honestly. The right path to optimize performance is mainly described in the official documentation in the section concerning Tuning Spark.

通常来说,优化火花作业的因素很多:

Generally speaking, there is lots of factors to optimize spark jobs :

  • 数据序列化
  • 内存调整
  • 并行度
  • 减少任务的内存使用量
  • 广播大变量
  • 数据位置

它主要集中在数据序列化,内存调整以及精度/逼近技术之间的权衡,以快速完成工作.

It's mainly centralized around data serialization, memory tuning and a trade-off between precision/approximation techniques to get the job done fast.

由@ zero323提供:

Courtesy of @zero323 :

我要指出的是,问题中提到的所有选项(仅一个选项)都已弃用,并且仅在旧版模式下使用.

I'd point out, that all but one option mentioned in the question, are deprecated and used only in legacy mode.

这篇关于Apache Spark的性能调优的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆