数据流性能问题 [英] Dataflow performance issues

查看：108 发布时间：2020/11/18 1:50:26 google-cloud-dataflow

本文介绍了数据流性能问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道几周前对CDF服务进行了更新(更改了默认工作程序类型和附加的PD)，并且很明显这会使批处理作业变慢.但是，我们的工作绩效已经下降到无法实际满足我们的业务需求的程度.

I'm aware that an update was made to the CDF service a few weeks ago (default worker type & attached PD were changed), and it was made clear that it would make batch jobs slower. However, the performance of our jobs has degraded beyond the point of them actually fulfilling our business needs.

例如，对于我们的一项特别的工作:它从BigQuery的一个表中读取约270万行，具有6个侧面输入(BQ表)，进行一些简单的String转换，最后将多个输出(3)写入BigQuery.过去通常要花5-6分钟，现在却要花15-20分钟之间的时间-不管我们要扔掉多少虚拟机.

For example, for one of our jobs in particular: it reads ~2.7 million rows from a table in BigQuery, has 6 side inputs (BQ tables), does some simple String transformations, and finally writes multiple outputs (3) to BigQuery. This used to take 5-6 minutes and now it takes anywhere between 15-20 mins - not matter how many VM's we chuck at it.

我们可以做些什么来使速度恢复到我们以前看到的速度吗?

Is there anything we can do to get the speeds back up to what we used to see?

以下是一些统计信息:

从具有2,744,897行(294MB)的BQ表中读取
6个BQ侧输入
3个到BQ的多输出，其中2个是2,744,897，其他1,500行
在Asia-east1-b地区奔跑
以下时间包括工人池旋转和拆除

10个虚拟机(n1-standard-2) 16分钟5秒 2015-04-22_19_42_20-4740106543213058308

10 VMs (n1-standard-2) 16 mins 5 sec 2015-04-22_19_42_20-4740106543213058308

10个虚拟机(n1-standard-4) 17分11秒 2015-04-22_20_04_58-948224342106865432

10 VMs (n1-standard-4) 17 min 11 sec 2015-04-22_20_04_58-948224342106865432

10个虚拟机(n1-standard-1) 18分钟44秒 2015-04-22_19_42_20-4740106543213058308

10 VMs (n1-standard-1) 18 min 44 sec 2015-04-22_19_42_20-4740106543213058308

20个虚拟机(n1-standard-2) 22分53秒 2015-04-22_21_26_53-18171886778433479315

20 VMs (n1-standard-2) 22 min 53 sec 2015-04-22_21_26_53-18171886778433479315

50个虚拟机(n1-standard-2) 17分26秒 2015-04-22_21_51_37-16026777746175810525

50 VMs (n1-standard-2) 17 min 26 sec 2015-04-22_21_51_37-16026777746175810525

100个虚拟机(n1-standard-2) 19分钟33秒 2015-04-22_22_32_13-9727928405932256127

100 VMs (n1-standard-2) 19 min 33 sec 2015-04-22_22_32_13-9727928405932256127

数据流性能问题 [英] Dataflow performance issues

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据流性能问题 [英] Dataflow performance issues

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭