在EMR群集上优化GC [英] Optimizing GC on EMR cluster

查看:209
本文介绍了在EMR群集上优化GC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在EMR上运行一个用Scala编写的Spark Job,每个执行程序的stdout都被GC分配失败填满。

  2016-12-07T23:42:20.614 + 0000:[GC(分配失败)2016-12-07T23:42: 20.614 + 0000:[ParNew:909549K-> 432K(1022400K),0.0089234secs] 2279433K-> 1370373K(3294336K),0.0090530secs] [Times:user = 0.11sys = 0.00,real = 0.00secs] 
2016-12-07T23:42:21.572 + 0000:[GC(分配失败)2016-12-07T23:42:21.572 + 0000:[新增:909296K-> 435K(1022400K),0.0089298秒] 2279237K - > 1370376K (3294336K),0.0091147秒] [时间:用户= 0.11 sys = 0.01,实际= 0.00秒]
2016-12-07T23:42:22.525 + 0000:[GC(分配失败)2016-12-07T23: 42:22.525 + 0000:[ParNew:909299K-> 485K(1022400K),0.0080858秒] 2279240K-> 1370427K(3294336K),0.0082357秒] [时间:用户= 0.12 sys = 0.00,实际= 0.01秒]
2016-12-07T23:42:23.474 + 0000:[GC(分配失败)2016-12-07T23:42:23.474 + 0000:[ParNew:909349K-> 547K(1022400K),0.0090641 secs] 2279291K-> ; 1370489K(3294336K),0.0091965秒] [Times:user = 0.12 sys = 0.00,real = 0.00 secs]

我正在读取几TB的数据(主要是字符串),所以我担心常量GC会减慢处理时间。

我很感激任何有关如何理解此消息的指示以及以优化GC,以便消耗最少的CPU时间。 分配失败启动GC周期的最常见原因。



日志告诉GC每秒发生一次,大约需要10ms,即1%的时间。国际海事组织,这里没有什么可以优化的。


I am running a Spark Job written in Scala on EMR and the stdout of each executor is filled with GC allocation failures.

2016-12-07T23:42:20.614+0000: [GC (Allocation Failure) 2016-12-07T23:42:20.614+0000: [ParNew: 909549K->432K(1022400K), 0.0089234 secs] 2279433K->1370373K(3294336K), 0.0090530 secs] [Times: user=0.11 sys=0.00, real=0.00 secs] 
2016-12-07T23:42:21.572+0000: [GC (Allocation Failure) 2016-12-07T23:42:21.572+0000: [ParNew: 909296K->435K(1022400K), 0.0089298 secs] 2279237K->1370376K(3294336K), 0.0091147 secs] [Times: user=0.11 sys=0.01, real=0.00 secs] 
2016-12-07T23:42:22.525+0000: [GC (Allocation Failure) 2016-12-07T23:42:22.525+0000: [ParNew: 909299K->485K(1022400K), 0.0080858 secs] 2279240K->1370427K(3294336K), 0.0082357 secs] [Times: user=0.12 sys=0.00, real=0.01 secs] 
2016-12-07T23:42:23.474+0000: [GC (Allocation Failure) 2016-12-07T23:42:23.474+0000: [ParNew: 909349K->547K(1022400K), 0.0090641 secs] 2279291K->1370489K(3294336K), 0.0091965 secs] [Times: user=0.12 sys=0.00, real=0.00 secs] 

I am reading few TB's of data, (mostly string) so I am worried that the constant GC will slow down processing time.
I would appreciate any pointers on how to understand this message and how to optimize GC so that it consumes minimum CPU time.

解决方案

Allocation Failure is the normal and the most common reason for initiating GC cycle.

Logs tell that GC happens once a second and takes about 10ms, that is, 1% time. IMO, there is nothing to optimize here.

这篇关于在EMR群集上优化GC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆