Spark在独立版本中比在YARN中运行更快 [英] Spark working faster in Standalone rather than YARN

查看:189
本文介绍了Spark在独立版本中比在YARN中运行更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望获得有关独立式和纱线执行火花的一些见解.我们有一个4节点的cloudera集群,当前在YARN模式下运行时,应用程序的性能不到在独立模式下运行时所获得的性能的一半.是否有人对可能造成这种情况的因素有所了解.

Wanted some insights on spark execution on standalone and yarn. We have a 4 node cloudera cluster and currently the performance of our application while running in YARN mode is less than half than what we are getting while executing in standalone mode. Is anyone having some idea on the factors which might be contributing for this.

推荐答案

基本上,您的数据和群集太小.

Basically, your data and cluster are too small.

大数据技术的真正目的是处理无法容纳在单个系统上的数据.鉴于您的集群有4个节点,可能适合POC工作,但您不应认为这对基准测试应用程序是可以接受的.

Big Data technologies are really meant to handle data that cannot fit on a single system. Given your cluster has 4 nodes, it might be fine for POC work but you should not consider this acceptable for benchmarking your application.

要给您一个参考框架,请参阅Hortonworks的文章 BENCHMARK:SUB带有HIVE和DRUID的第二次分析使用以下项的集群:

To give you a frame of reference refer to Hortonworks's article BENCHMARK: SUB-SECOND ANALYTICS WITH APACHE HIVE AND DRUID uses a cluster of:

  • 10个节点
  • 2个Intel(R)Xeon(R)CPU E5-2640 v2 @ 2.00GHz,每个都有16个CPU线程
  • 每个节点256 GB RAM
  • 每个节点6个WDC WD4000FYYZ-0 1K02 4TB SCSI磁盘
  • 10 nodes
  • 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz with 16 CPU threads each
  • 256 GB RAM per node
  • 6x WDC WD4000FYYZ-0 1K02 4TB SCSI disks per node

这可以计算出320个CPU内核,2560GB RAM,240TB磁盘.

This works out to 320 CPU cores, 2560GB RAM, 240TB of disk.

Cloudera文章中的另一个基准

Another benchmark from Cloudera's article New SQL Benchmarks: Apache Impala (incubating) Uniquely Delivers Analytic Database Performance uses a 21 node cluster with each node at:

  • CPU:2个插槽,共12个核心,2.00 GHz时的Intel Xeon CPU E5-2630L 0
  • 12个磁盘驱动器,每个932GB(一个用于OS,其余用于HDFS)
  • 384GB内存
  • CPU: 2 sockets, 12 total cores, Intel Xeon CPU E5-2630L 0 at 2.00GHz
  • 12 disk drives at 932GB each (one for the OS, the rest for HDFS)
  • 384GB memory

这可以处理504个CPU内核,8064GB RAM和231TB磁盘.

This works out to 504 CPU cores, 8064GB RAM and 231TB of disk.

这应该给出一个规模的想法,以使您的系统可以可靠地用于基准测试.

This should give an idea of the scale that would qualify your system as reliable for benchmarking purposes.

这篇关于Spark在独立版本中比在YARN中运行更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆