AWS & 之间的流日志数据延迟是多少?谷歌云服务? [英] What is the streaming log data latency between AWS & Google cloud services?

本文介绍了AWS & 之间的流日志数据延迟是多少?谷歌云服务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人经历过:

  1. 将流式/微批处理日志数据从 Amazon 发送到 BigQuery 进行处理并可以阐明任何延迟问题?
  2. 将(微批处理)日志从 Google DataFlow 发送到 Amazon (Kinesis/S3/DynamoDB)

有人可以提供有关延迟的信息吗?

Can someone provide info on latency?

谢谢

推荐答案

在问题 1 中,我相信您对 BigQuery 摄取延迟感兴趣.根据 将数据流式传输到 BigQuery流式数据可用用于在第一次将流式插入到表中的几秒钟内进行实时分析.这种延迟很低,但由于从 Amazon 集群到 BigQuery API 的原始网络通信,它可能会主导您的任何延迟.

In question 1, I believe you're interested in BigQuery ingestion latency. Per Streaming Data into BigQuery, Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table. This latency is low, but it will probably dominate whatever latency you have due to raw network communication from an Amazon cluster to BigQuery API.

在问题 2 中,您可能对 Dataflow 本身的延迟感兴趣 - 假设数据到达 Dataflow 流媒体管道,例如通过 PubSub,实时,您正在处理它并最终写入亚马逊,您对结果返回的速度很感兴趣.

In question 2, you're probably interested in the latency of Dataflow itself - assuming data arrives into a Dataflow streaming pipeline, e.g. via PubSub, at real time, and you're processing it and ultimately writing to Amazon, and you're interested in how quickly the results come back.

这在很大程度上取决于您管道的窗口结构(例如,如果您将数据窗口化为 5 分钟的窗口,则数据将相应地进行缓冲).如果您根本不进行任何窗口化,则 Dataflow 本身引入的延迟应该很低(亚秒级).有关如何实现的详细信息,您可以参考 MillWheel 论文,了解 Dataflow 的流引擎是基于.

This depends highly on the windowing structure of your pipeline (e.g., if you window data into 5-minute windows, data will be buffered accordingly). If you don't do any windowing at all, latency introduced by Dataflow itself should be low (sub-second). For details of how that is achieved, you can consult the MillWheel paper on which Dataflow's streaming engine is based.

这篇关于AWS & 之间的流日志数据延迟是多少?谷歌云服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆