在Spark Streaming中是否需要检查点 [英] Is checkpointing necessary in spark streaming

查看:129
本文介绍了在Spark Streaming中是否需要检查点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经注意到,火花流示例也有用于检查点的代码.我的问题是检查点有多重要.如果它具有容错能力,那么在这种流应用程序中多久发生一次错误?

I have noticed that spark streaming examples also have code for checkpointing. My question is how important is that checkpointing. If its there for fault tolerance, how often do faults happen in such streaming applications?

推荐答案

这完全取决于您的用例.假设您正在运行一个流作业,该作业仅从Kafka中读取数据并计算记录数.如果您的应用程序在一年左右后崩溃,该怎么办?

It all depends on your use case. For suppose if you are running a streaming job, which just reads data from Kafka and counts the number of records. What would you do if your application crashes after a year or so?

  • 如果没有备份/检查点,则必须重新计算过去一年中所有有价值的数据,以便您可以继续计数.
  • 如果您有备份/检查点,则只需读取检查点数据并立即恢复即可.

或者,如果您只是在做一个流应用程序,而该应用程序只是从 Reads-Messages-From-Kafka >>>转换>>>插入到数据库,我不必担心关于我的应用程序崩溃.即使它崩溃了,我也可以直接恢复我的应用程序而不会丢失数据.

Or if all you are just doing is having a streaming application which just Reads-Messages-From-Kafka >>> Tranform >>> Insert-to-a-Database, I need not worry about my application crashing. Even if it's crashed, i can simply resume my application without loss of data.

注意:检查点是存储spark应用程序当前状态的过程.

Note: Check-pointing is a process which stores the current state of a spark application.

谈到容错的频率,您几乎永远无法预测中断.在公司中,

Coming to the frequency of fault tolerance, you can almost never predict an outage. In companies,

  • 可能会断电
  • 群集的常规维护/升级

希望这会有所帮助.

这篇关于在Spark Streaming中是否需要检查点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆