当缓慢的使用者在流处理中产生反压(火花，aws)时，避免数据丢失 [英] Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

查看：112 发布时间：2020/8/23 3:12:20 amazon-web-services spark-streaming amazon-sqs amazon-kinesis backpressure

本文介绍了当缓慢的使用者在流处理中产生反压(火花，aws)时，避免数据丢失的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是分布式流处理(Spark)的新手.我已经阅读了一些教程/示例，这些教程/示例涵盖了背压如何导致生产者因过载的消费者而减慢速度的情况.给出的经典示例是摄取和分析推文.当流量出现意外增长而使用户无法承受负载时，他们会施加背压，生产者会通过将速率降低一些来做出响应.

I'm new to distributed stream processing (Spark). I've read some tutorials/examples which cover how backpressure results in the producer(s) slowing down in response to overloaded consumers. The classic example given is ingesting and analyzing tweets. When there is an unexpected spike in traffic such that the consumers are unable to handle the load, they apply backpressure and the producer responds by adjusting its rate lower.

我没有真正看到的是实践中使用什么方法来处理由于整个管道容量较低而无法立即处理的大量传入实时数据?

What I don't really see covered is what approaches are used in practice to deal with the massive amount of incoming real-time data which cannot be immediately processed due to the lower capacity of the entire pipeline?

我想这的答案取决于业务领域.对于某些问题，只删除该数据可能会很好，但是在这个问题中，我想重点介绍一个我们不想丢失任何数据的情况.

I imagine the answer to this is business domain dependent. For some problems it might be fine to just drop that data, but in this question I would like to focus on a case where we don't want to lose any data.

由于我将在AWS环境中工作，所以我的第一个想法是缓冲" SQS队列或Kinesis流中的多余数据.是真的这样简单吗，还是针对这个问题有一个更标准的流式解决方案(也许是Spark本身的一部分)?

Since I will be working in an AWS environment, my first thought would be to "buffer" the excess data in an SQS queue or a Kinesis stream. Is it as simple as this in practice, or this there a more standard streaming solution to this problem (perhaps as part of Spark itself)?

当缓慢的使用者在流处理中产生反压(火花，aws)时，避免数据丢失 [英] Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当缓慢的使用者在流处理中产生反压(火花，aws)时，避免数据丢失 [英] Avoiding data loss when slow consumers force backpressure in stream processing (spark, aws)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭