无法在Python Cloud Dataflow中更新工作状态异常 [英] Failed to update work status Exception in Python Cloud Dataflow

查看:224
本文介绍了无法在Python Cloud Dataflow中更新工作状态异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python Cloud Dataflow作业,可以在较小的子集上正常工作,但在完整数据集中似乎没有明显的原因。



我在Dataflow接口中遇到的唯一错误是标准错误消息:


<一个工作项目尝试了4次,但没有成功。每次工作人员最终与服务失去联系。

分析Stackdriver日志只会显示以下错误:


工作循环中的异常:Traceback(最近调用最后一次):文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py ,第736行,运行deferred_exception_details = deferred_exception_details)文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py,第590行,在do_work exception_details = exception_details中)文件/ usr / local /lib/python2.7/dist-packages/apache_beam/utils/retry.py,第167行,在包装中返回fun(* args,** kwargs)文件/usr/local/lib/python2.7/dist- packages / dataflow_worker / batchworker.py,第454行,在report_completion_status exception_details = exception_details中)文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py,第266行,在report_status中work_executor = self ._work_executor)文件/usr/local/lib/python2.7/dist-packages /dataflow_worker/workerapiclient.py,第364行,在report_status response = self._client.projects_jobs_workItems.ReportStatus(request)File/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/dataflow/ dataflow_v1b3_client.py,第210行,在ReportStatus config,request,global_params = global_params中)文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py,第723行,in _RunMethod在ProcessHttpResponse self .__ ProcessHttpResponse(method_config)中返回self.ProcessHttpResponse(method_config,http_response,request)文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py,行729 ,http_response,request))文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py,第599行,位于__ProcessHttpResponse http_response.request_url,method_config,request)HttpError:HttpError访问https://dataflow.googleapis.com/v1b3/projects//jobs/2017-05-03_03_33_40-3860129055041750274/workItems:reportStatus?alt=json>:response:< {'状态':'400','内容长度':'360','x-xss-protection':'1;模块= block','x-content-type-options':'nosniff','transfer-encoding':'chunked','vary':'Origin,X-Origin,Referer','server':'ESF' ,'-content-encoding':'gzip','cache-control':'private','date':'Wed,03 May 2017 16:46:11 GMT','x-frame-options':'SAMEORIGIN ','content-type':'application / json;原因:(2a7b20b33659c523):失败(charset = UTF-8'}>,内容为< {error:{code:400,message:(2a7b20b33659c46e):Failed to publish the result of the work update。 (8a8b13f5c3a944ba):无法更新工作状态,(8a8b13f5c3a945d9):Work \4047499437681669251\未租赁(或租约已遗失)。,status:INVALID_ARGUMENT} }


我假设未能更新工作状态错误与云跑者?但由于我没有在网上找到关于这个错误的任何信息,我想知道是否有其他人遇到它,并且有更好的解释吗?



我正在使用用于Python 0.5.5的Google云数据流SDK

解决方案

到期与虚拟机上的内存压力有关。你可以尝试在内存较高的机器上运行你的工作。特别是highmem机器类型应该可以做到这一点。

有关机器类型的更多信息,请查看 GCE文档 下一个Dataflow版本(2.0.0)应该能够以更好地处理这些情况。


I have a Python Cloud Dataflow job that works fine on smaller subsets, but seems to be failing for no obvious reasons on the complete dataset.

The only error I get in the Dataflow interface is the standard error message:

A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service.

Analysing the Stackdriver logs only shows this error:

Exception in worker loop: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 736, in run deferred_exception_details=deferred_exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 590, in do_work exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 167, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 454, in report_completion_status exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 266, in report_status work_executor=self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 364, in report_status response = self._client.projects_jobs_workItems.ReportStatus(request) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py", line 210, in ReportStatus config, request, global_params=global_params) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing https://dataflow.googleapis.com/v1b3/projects//jobs/2017-05-03_03_33_40-3860129055041750274/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 03 May 2017 16:46:11 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(2a7b20b33659c46e): Failed to publish the result of the work update. Causes: (2a7b20b33659c523): Failed to update work status. Causes: (8a8b13f5c3a944ba): Failed to update work status., (8a8b13f5c3a945d9): Work \"4047499437681669251\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >

I assume this Failed to update work status error is related to the Cloud Runner? But since I didn't find any information on this error online, I was wondering if somebody else encountered it and does have a better explanation?

I am using Google Cloud Dataflow SDK for Python 0.5.5.

解决方案

One major cause of lease expirations is related to memory pressure on the VM. You may try running your job on machines with higher memory. Particularly, a highmem machine type should do the trick.

For more info on machine types, please check out the GCE Documentation

The next Dataflow release (2.0.0) should be able to handle these cases better.

这篇关于无法在Python Cloud Dataflow中更新工作状态异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆