如何从 com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 上的 Cloud Dataflow 作业中恢复失败 [英] How to recover from Cloud Dataflow job failed on com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone

查看:19
本文介绍了如何从 com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 上的 Cloud Dataflow 作业中恢复失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Cloud Dataflow 作业在运行 4 小时后神秘地失败了,因为工作人员抛出了四次此异常(在一个小时内).异常堆栈看起来像这样.

My Cloud Dataflow job, after running for 4 hours, mysteriously failed because a worker is throwing this exception four times (in a span of an hour). The exception stack looks like this.

java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }

at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:516)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:419)
at com.google.cloud.dataflow.sdk.io.Write$Bound$2.finishBundle(Write.java:201) Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

堆栈跟踪中的任何类都不是直接来自我的工作,所以我什至无法捕捉和恢复.

None of the class in the stacktrace is from my job directly, so I cannot even catch and recover.

我检查了我的地区、云存储(由同一项目拥有)等,它们都没有问题.其他工人也运转良好.看起来像 Dataflow 中的某种错误?如果没有别的,我真的很想知道如何从中恢复:这项工作总共花费了 30 多个小时,现在产生了一堆我不知道它们有多完整的临时文件......如果我重新运行我担心它会再次失败.

I checked my region, Cloud storage (owned by the same project) etc, they are all OK. Other workers were also running fine. Looks like some kind of bug in Dataflow? If nothing else I really would like to know how to recover from this: the job spend 30+ hours in totally and now produced a bunch of temp files that I don't know how complete they are... If I re-run the job I am concerned that it would fail again.

对于 Google 员工来说,作业 ID 是 2016-08-25_21_50_44-3818926540093331568.谢谢!!

The job id is 2016-08-25_21_50_44-3818926540093331568 , for the Google folks. Thanks!!

推荐答案

解决方案是在输出上指定 withNumShards() 一个固定值 <10000.这是我们希望在未来消除的限制.

The solution was to specify withNumShards() on the output with a fixed value < 10000. This is a limitation that we hope to remove in the future.

这篇关于如何从 com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 上的 Cloud Dataflow 作业中恢复失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆