Spark的性能瓶颈 [英] Performance bottleneck of Spark

查看：91 发布时间：2020/9/20 19:52:33 performance apache-spark bigdata distributed-computing

本文介绍了Spark的性能瓶颈的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

发表在NSDI 2015上的论文了解数据分析框架中的性能"得出的结论是，CPU(而非IO或网络)是Spark的性能瓶颈. Kay在Spark上进行了一些实验，包括BDbench，TPC-DS和生产工作负载(仅使用Spark SQL?).我不知道这个结论是否适用于基于Spark的某些框架(例如Streaming，通过网络接收连续的数据流，网络IO和磁盘都将承受很大压力).

A paper "Making Sense of Performance in Data Analytics Frameworks" published in NSDI 2015 gives the conclusion that CPU(not IO or network) is the performance bottleneck of Spark. Kay has done some experiments on Spark including BDbench ,TPC-DS and a procdution workload(only Spark SQL is used?) in this paper. I wonder whether this conclusion is right for some frameworks built on Spark(like Streaming,with a continuous data stream received through network,both network IO and disk will suffer high pressure ).

Spark的性能瓶颈 [英] Performance bottleneck of Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark的性能瓶颈 [英] Performance bottleneck of Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭