Spark Streaming:应用程序运行状况 [英] Spark Streaming: Application health

查看:307
本文介绍了Spark Streaming:应用程序运行状况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Kafka 基于 Spark Streaming 应用程序,每5分钟运行一次。查看运行5天后的统计数据,有几点意见:


  1. 处理时间从30秒到50秒逐渐增加。快照如下所示,其中突出显示了处理时间表:

  2. 垃圾收集日志显示如下所示:
    < img src =https://i.stack.imgur.com/c0bqH.jpgalt =快照>


问题


  1. 有没有一个很好的解释,为什么处理时间已经大幅增加,即使事件数量大致相同(在最后一个波谷期间)?
  2. 我差不多70 GC日志在每个处理周期结束时。这是正常的?

  3. 更好的策略是确保处理时间保持可接受的延迟状态吗?


解决方案

这真的取决于应用程序。在调试此问题时,我会采用以下方法:


  1. 在存储选项卡下,查看存储大小是否没有增长。如果有增长,这可能表明某种缓存资源泄漏。检查 spark.cleaner.ttl 的值是什么,但最好确保在不再需要的时候解除所有资源的存取。

  2. 检查运行作业的DAG可视化,并查看该谱系是否没有增长。如果是这种情况,请确保执行检查点减少血统。

  3. 减少UI中保留批次的数量( spark.streaming.ui.retainedBatches 参数)。

  4. 即使事件数量相同,请查看任务处理的数据量是否不随时间增长(阶段选项卡 - >输入列)。这可能指向应用程序级别问题。

我有相对复杂的Spark Streaming应用程序(Spark v1.6,v2.1.1 ,v2.2.0)运行了几天,性能没有任何退化,所以必须有一些可以解决的问题。


I have a Kafka based Spark Streaming application that runs every 5 mins. Looking at the statistics after 5 days of run, there are a few observations:

  1. The Processing time gradually increases from 30 secs to 50 secs. The snapshot is shown below which highlights the processing time chart:

  2. A good number of Garbage collection logs are appearing as shown below:

Questions:

  1. Is there a good explanation why the Processing Time has increased substantially, even when number of events are more or less same (during the last trough) ?
  2. I am getting almost 70 GC logs at the end of each processing cycle. It is normal?
  3. Is the a better strategy to ensure the processing time to remain with in acceptable delays?

解决方案

It really depends on the application. The way I'd approach when debugging this issue is the following:

  1. Under Storage tab see whether the stored sizes are not growing. If there's a growth this can indicate some kind of cached resources leak. Check what's the value of spark.cleaner.ttl, but better make sure you uncache all the resources when they are not needed anymore.
  2. Inspect DAG visualization of running jobs, and see whether the lineage is not growing. If this is the case, make sure to perform checkpointing to cut the lineage.
  3. Reduce the number of retained batches in UI (spark.streaming.ui.retainedBatches parameter).
  4. Even the number of events is the same, please see whether the amount of data processed by tasks doesn't grow with time (Stages tab -> Input column). This could point to an application level issue.

I've had relatively complex Spark Streaming applications (Spark v1.6, v2.1.1, v2.2.0) running for days without any degradation in performance, so there must be some solvable issue.

这篇关于Spark Streaming:应用程序运行状况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆