数据流:无工作人员活动 [英] Dataflow: No Worker Activity

查看：75 发布时间：2020/9/3 5:29:55 python google-cloud-platform google-cloud-dataflow apache-beam

本文介绍了数据流:无工作人员活动的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在从AI Platform Notebook运行相对原始的Dataflow作业时遇到一些问题(该作业旨在从BigQuery中获取数据>清理并准备>在GCS中写入CSV):

I'm having a few problems running a relatively vanilla Dataflow job from an AI Platform Notebook (the job is meant to take data from BigQuery > cleanse and prep > write to a CSV in GCS):

options = {'staging_location': '/staging/location/',
           'temp_location': '/temp/location/',
           'job_name': 'dataflow_pipeline_job',
           'project': PROJECT,
           'teardown_policy': 'TEARDOWN_ALWAYS',
           'max_num_workers': 3,
           'region': REGION,
           'subnetwork': 'regions/<REGION>/subnetworks/<SUBNETWORK>',
           'no_save_main_session': True}
opts = beam.pipeline.PipelineOptions(flags=[], **options)  
p = beam.Pipeline('DataflowRunner', options=opts)
(p 
 | 'read' >> beam.io.Read(beam.io.BigQuerySource(query=selquery, use_standard_sql=True))
 | 'csv' >> beam.FlatMap(to_csv)
 | 'out' >> beam.io.Write(beam.io.WriteToText('OUTPUT_DIR/out.csv')))
p.run()

从堆栈驱动程序返回错误:

Error returned from stackdriver:

工作流程失败.原因:数据流作业似乎卡住了，因为在最近1小时内未发现任何工作人员活动.您可以通过 https://cloud.google.com/dataflow/support.

以下警告:

S01:eval_out/WriteToText/Write/WriteImpl/DoOnce/Read + out/WriteToText/Write/WriteImpl/InitializeWrite失败.

S01:eval_out/WriteToText/Write/WriteImpl/DoOnce/Read+out/WriteToText/Write/WriteImpl/InitializeWrite failed.

不幸的是，除此之外没有其他.其他注意事项:

Unfortunately not much else other than that. Other things to note:

作业在本地运行，没有任何错误
网络正在自定义模式下运行，但它是默认网络
Python版本== 3.5.6
Python Apache Beam版本== 2.16.0
AI平台笔记本实际上是一个GCE实例，在其顶部部署了深度学习VM映像(具有容器优化的OS)，然后我们使用端口转发来访问Jupyter环境
请求作业的服务帐户(Compute Engine默认服务帐户)具有完成此操作所需的必要权限
笔记本实例，数据流作业，GCS存储桶都位于europe-west1
我还尝试在标准的AI平台笔记本电脑上运行此程序，仍然是同样的问题.

The job ran locally without any error
The network is running in custom mode but is the default network
Python Version == 3.5.6
Python Apache Beam version == 2.16.0
The AI Platform Notebook is infact a GCE instance with a Deep Learning VM image deployed on top (with a container optimised OS), we have then used port forwarding to access the Jupyter environment
The service account requesting the job (Compute Engine default service account) has the necessary permissions required to complete this
Notebook instance, dataflow job, GCS bucket are all in europe-west1
I've also tried running this on a standard AI Platform Notebook and still the same problem.

任何帮助将不胜感激！请让我知道是否还有其他可以提供帮助的信息.

Any help would be much appreciated! Please let me know if there is any other info I can provide which will help.

我已经意识到我的错误与以下内容相同:

I've realised that my error is the same as the following:

为什么数据流步骤无法开始?

我的工作被卡住的原因是因为写入gcs步骤首先运行，即使它打算最后运行.有关如何解决此问题的任何想法?

The reason my job has gotten stuck is because the write to gcs step runs first even though it is meant to run last. Any ideas on how to fix this?

数据流:无工作人员活动 [英] Dataflow: No Worker Activity

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数据流:无工作人员活动 [英] Dataflow: No Worker Activity

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭