如何将Apache流与DynamoDB流一起使用 [英] How to use Apache Streaming with DynamoDB Stream

查看:82
本文介绍了如何将Apache流与DynamoDB流一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个要求,每当向最终用户投放广告时,我们都将事件记录在DynamoDB表中.dynamoDB表中每秒有250次以上写入该表.

We have a requirement wherein we log events in a DynamoDB table whenever an ad is served to the end user. There are more than 250 writes into this table per sec in the dynamoDB table.

我们希望将这些数据汇总并移动到Redshift进行分析.

We would want to aggregate and move this data to Redshift for analytics.

我假设在表中进行的每个插入都会调用DynamoDB流.我如何将DynamoDB流馈送到某种批次中,然后处理这些批次.围绕此类用例是否有最佳实践?

The DynamoDB stream will be called for every insert made in the table i suppose. How can I feed the DynamoDB stream into some kind of batches and then process those batches. Are there any best practices around such kind of use cases ?

我正在阅读有关Apache Spark的信息,似乎使用Apache Spark可以进行这种聚合.但是apache spark流不会读取DynamoDB流.

I was reading about apache spark and seems like with Apache Spark we can do such kind of aggregation. But apache spark stream does not read the DynamoDB stream.

感谢任何帮助或指示.

谢谢

推荐答案

DynamoDB流具有两个接口:低级API和Kinesis Adapter.Apache Spark具有运动集成,因此您可以使用它们一起.如果您想知道应该使用哪种DynamoDB流接口,AWS建议使用Kinesis Adapter.

DynamoDB streams have two interfaces: low-level API, and Kinesis Adapter. Apache Spark has a Kinesis integration, so you can use them together. In case if you are wondering what DynamoDB streams interface you should use, AWS suggests that Kinesis Adapter is a recommended way.

这里是如何使用用于DynamoDB的Kinesis适配器.

要考虑的其他几件事:

  • 值得一提的是 Apache Flink ,而不是使用Apache Spark.它是流优先解决方案(Spark使用微批处理实现流),具有较低的延迟,较高的吞吐量,更强大的流运算符,并支持循环处理.它还具有运动适配器

  • Instead of using Apache Spark it is worth looking at Apache Flink. It is a stream-first solution (Spark implements streaming using micro-batching), has lower latencies, higher throughput, more powerful streaming operators, and has support for cycling processing. It also has a Kinesis adapter

在某些情况下,您不需要DynamoDB流即可将数据导出到Redshift.您可以使用Redshift命令导出数据.

It can be the case that you don't need DynamoDB streams to export data to Redshift. You can export data using Redshift commands.

这篇关于如何将Apache流与DynamoDB流一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆