无法更新 Python Cloud Dataflow 中的工作状态异常 [英] Failed to update work status Exception in Python Cloud Dataflow

查看:23
本文介绍了无法更新 Python Cloud Dataflow 中的工作状态异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Python Cloud Dataflow 作业,它可以在较小的子集上正常工作,但在整个数据集上似乎由于没有明显原因而失败.

I have a Python Cloud Dataflow job that works fine on smaller subsets, but seems to be failing for no obvious reasons on the complete dataset.

我在 Dataflow 界面中遇到的唯一错误是标准错误消息:

The only error I get in the Dataflow interface is the standard error message:

一个工作项尝试了 4 次,但没有成功.每次工作人员最终与服务失去联系.

A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service.

分析 Stackdriver 日志仅显示此错误:

Analysing the Stackdriver logs only shows this error:

工作循环中的异常:回溯(最近一次调用最后一次):文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",第 736 行,运行中 deferred_exception_details=deferred_exception_details) 文件"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 590, in do_work exception_details=exception_details) 文件 "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py",第 167 行,在包装器中返回 fun(*args, **kwargs) 文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",第 454 行,在report_completion_status exception_details=exception_details)文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",第266行,在report_status work_executor=self._work_executor)文件/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 364, in report_status response = self._client.projects_jobs_workItems.ReportStatus(request) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py",第 210 行,在 ReportStatus 配置中,请求,global_params=global_params)文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py",第 723 行,在 _RunMethod 中返回 self.ProcessHttpResponse(method_config, http_response, request) 文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py",第 729 行, 在 ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) 文件/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py",第 599 行,在 __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing https://dataflow.googleapis.com/v1b3/projects//jobs/2017-05-03_03_33_40-3860129055041750274/workItems:reportStatus?alt=json>: response: <uslt;': '400', 'content-length': '360', 'x-xss-protection': '1;mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 03 May 2017 16:46:11 GMT', 'x-frame-options': 'SAMEORIGIN', '内容类型': '应用程序/json;charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(2a7b20b33659c46e): 发布工作更新结果失败.原因:(2a7b20b33659c523): Failed更新工作状态.原因:(8a8b13f5c3a944ba):无法更新工作状态.,(8a8b13f5c3a945d9):工作\4047499437681669251\"未租用(或租约丢失).",ID_AR":GUMENTIN}"

Exception in worker loop: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 736, in run deferred_exception_details=deferred_exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 590, in do_work exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 167, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 454, in report_completion_status exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 266, in report_status work_executor=self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 364, in report_status response = self._client.projects_jobs_workItems.ReportStatus(request) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py", line 210, in ReportStatus config, request, global_params=global_params) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing https://dataflow.googleapis.com/v1b3/projects//jobs/2017-05-03_03_33_40-3860129055041750274/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 03 May 2017 16:46:11 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(2a7b20b33659c46e): Failed to publish the result of the work update. Causes: (2a7b20b33659c523): Failed to update work status. Causes: (8a8b13f5c3a944ba): Failed to update work status., (8a8b13f5c3a945d9): Work \"4047499437681669251\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >

我认为这个 Failed to update work status 错误与 Cloud Runner 相关?但是由于我在网上没有找到有关此错误的任何信息,所以我想知道是否有人遇到过并且是否有更好的解释?

I assume this Failed to update work status error is related to the Cloud Runner? But since I didn't find any information on this error online, I was wondering if somebody else encountered it and does have a better explanation?

我使用的是 Google Cloud Dataflow SDK for Python 0.5.5.

推荐答案

租约到期的一个主要原因与 VM 的内存压力有关.您可以尝试在具有更高内存的机器上运行您的作业.特别是,highmem 机器类型应该可以解决问题.

One major cause of lease expirations is related to memory pressure on the VM. You may try running your job on machines with higher memory. Particularly, a highmem machine type should do the trick.

有关机器类型的更多信息,请查看GCE 文档

For more info on machine types, please check out the GCE Documentation

下一个 Dataflow 版本 (2.0.0) 应该能够更好地处理这些情况.

The next Dataflow release (2.0.0) should be able to handle these cases better.

这篇关于无法更新 Python Cloud Dataflow 中的工作状态异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆