Google Cloud Dataflow BigQueryIO.Write出现未知错误(http代码500) [英] Google Cloud Dataflow BigQueryIO.Write occur Unknown Error (http code 500)

查看:156
本文介绍了Google Cloud Dataflow BigQueryIO.Write出现未知错误(http代码500)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人发生了与我相同的问题,Google Cloud Dataflow BigQueryIO.Write发生未知错误(http code 500)?

我在4月份使用Dataflow处理一些数据,5月,6月,我使用相同的代码来处理4月份的数据(400MB)并写入BigQuery成功,但是当我处理5月(60MB)或6月(90MB)数据时,它是失败的。




  • 4月,5月和6月的数据格式相同。 从BigQuery转换到TextIO,作业会成功,所以我认为数据格式不错。

  • Log Dashboard没有任何错误日志.....

  • 系统只有相同的未知错误


我写的代码在这里:

解决方案

对不起,感到沮丧。看起来您正在写入BQ的文件数量受到限制。这是一个我们正在修复的已知问题。



与此同时,您可以通过减少输入文件的数量或重新分解来解决此问题数据(执行GroupByKey然后取消组合数据 - 在语义上它是无操作的,但它强制数据被实现,以便写操作的并行性不受读取的并行性限制)。 p>

Has somebody occur same problem with me that Google Cloud Dataflow BigQueryIO.Write happen unknown error (http code 500)?

I use Dataflow to handle some data in April, May, June, I use same code to process April data (400MB) and write to BigQuery success, but when I process May (60MB) or June (90MB) data, It was fail.

  • The data format in April, May and June are same.
  • Change writer from BigQuery to TextIO, job will success, so I think data format is good.
  • Log Dashboard no any error log.....
  • System only same unknown error

The code I wrote is here: http://pastie.org/10907947

Error Message after "Executing BigQuery import job":

Workflow failed. Causes: 
(cc846): S01:Read Files/Read+Window.Into()+AnonymousParDo+BigQueryIO.Write/DataflowPipelineRunner.BatchBigQueryIOWrite/DataflowPipelineRunner.BatchBigQueryIONativeWrite failed., 
(e19a27451b49ae8d): BigQuery import job "dataflow_job_631261" failed., (e19a745a666): BigQuery creation of import job for table "hi_event_m6" in dataset "TESTSET" in project "lib-ro-123" failed., 
(e19a2749ae3f): BigQuery execution failed., 
(e19a2745a618): Error: Message: An internal error occurred and the request could not be completed. HTTP Code: 500

解决方案

Sorry for the frustration. Looks like you are hit a limit on the number of files being written to BQ. This is a known issue that we're in the process of fixing.

In the meantime, you can work around this issue by either decreasing the number of input files or resharding the data (do a GroupByKey and then ungroup the data -- semantically it's a no-op, but it forces the data to be materialized so that the parallelism of the write operation isn't constrained by the parallelism of the read).

这篇关于Google Cloud Dataflow BigQueryIO.Write出现未知错误(http代码500)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆