数据流Apache Beam Python作业被逐级卡住 [英] Dataflow Apache beam Python job stuck at Group by step

查看：100 发布时间：2020/9/3 5:30:09 google-bigquery apache-beam dataflow

本文介绍了数据流Apache Beam Python作业被逐级卡住的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行一个数据流作业，该作业从BigQuery中读取并在8 GB of data and result in more than 50,000,000 records.周围进行扫描.现在，我要一步一步地基于键进行分组，并且需要将一列连接起来.但是，当连接列的连接大小超过100 MB之后，为什么我必须在数据流作业中执行该分组依据，因为该分组依据无法在Bigquery level due to row size limit of 100 MB.

I am running a dataflow job, which readed from BigQuery and scans around 8 GB of data and result in more than 50,000,000 records. Now at group by step I want to group based on a key and one column need to be concatenated . But After concatenated size of concatenated column becomes more than 100 MB that why I have to do that group by in dataflow job because that group by can not be done in Bigquery level due to row size limit of 100 MB.

现在，从BigQuery读取数据流作业时，它的伸缩性很好，但被卡在Group by Step上，我有2个版本的数据流代码，但是两者都在卡入中. When I checked the stack driver logs, it says, processing stuck at lull for more than 1010 sec time(similar kind of message) and Refusing to split GroupedShuffleReader <dataflow_worker.shuffle.GroupedShuffleReader object at 0x7f618b406358> kind of message

Now the dataflow job scales well when reading from BigQuery but stuck at Group by Step , I have 2 version of dataflow code, but both are stucking at group by step. When I checked the stack driver logs, it says, processing stuck at lull for more than 1010 sec time(similar kind of message) and Refusing to split GroupedShuffleReader <dataflow_worker.shuffle.GroupedShuffleReader object at 0x7f618b406358> kind of message

我希望按状态分组可以在20分钟内完成，但是会停留超过1个小时且永远不会完成

I expect the group by state to be completed within 20 mins but is stuck for more than 1 hours and never gets finished

数据流Apache Beam Python作业被逐级卡住 [英] Dataflow Apache beam Python job stuck at Group by step

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据流Apache Beam Python作业被逐级卡住 [英] Dataflow Apache beam Python job stuck at Group by step

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭