数据流作业在BigQuery写入时失败,并出现后端错误 [英] Dataflow job fails at BigQuery write with backend errors
问题描述
我的工作失败了,并且与最终导入BigQuery有关的几个不同的错误.我已经运行了5次,但每次都会失败,尽管错误消息有时会有所不同.当我在SQLite数据库上本地运行该工作时,该工作正常进行,因此我认为问题出在Google后端.
I have a job which is failing with several different errors related to the final import into BigQuery. I've run it 5 times and it fails each time, though the error message sometimes varies. The job worked fine when I ran it locally against an SQLite database, so I think the problem is on the Google backend.
一条错误消息:
**Workflow failed. Causes: S04:write meter_traces_combined to BigQuery/WriteToBigQuery/NativeWrite failed., BigQuery import job "dataflow_job_5111748333716803539" failed., BigQuery creation of import job for table "meter_traces_combined" in dataset "ebce" in project "oeem-ebce-platform" failed., BigQuery execution failed., Unknown error.**
另一则错误消息:
raceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 178, in execute
op.finish()
File "dataflow_worker/native_operations.py", line 93, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "dataflow_worker/native_operations.py", line 94, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "dataflow_worker/native_operations.py", line 95, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/nativefileio.py", line 465, in __exit__
self.file.close()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystemio.py", line 217, in close
self._uploader.finish()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/gcsio.py", line 588, in finish
raise self._upload_thread.last_error # pylint: disable=raising-bad-type
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/gcsio.py", line 565, in _start_upload
self._client.objects.Insert(self._insert_request, upload=self._upload)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1154, in Insert
upload=upload, upload_config=upload_config)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 715, in _RunMethod
http_request, client=self.client)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 908, in InitializeUpload
return self.StreamInChunks()
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 1020, in StreamInChunks
additional_headers=additional_headers)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 971, in __StreamMedia
self.RefreshResumableUploadState()
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 873, in RefreshResumableUploadState
self.stream.seek(self.progress)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystemio.py", line 301, in seek
offset, whence, self.position, self.last_block_position))
NotImplementedError: offset: 10485760, whence: 0, position: 16777216, last: 8388608
另一个错误消息:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 178, in execute
op.finish()
File "dataflow_worker/native_operations.py", line 93, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "dataflow_worker/native_operations.py", line 94, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "dataflow_worker/native_operations.py", line 95, in dataflow_worker.native_operations.NativeWriteOperation.finish
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/nativeavroio.py", line 309, in __exit__
self._data_file_writer.fo.close()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystemio.py", line 217, in close
self._uploader.finish()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/gcsio.py", line 588, in finish
raise self._upload_thread.last_error # pylint: disable=raising-bad-type
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/gcsio.py", line 565, in _start_upload
self._client.objects.Insert(self._insert_request, upload=self._upload)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1154, in Insert
upload=upload, upload_config=upload_config)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 715, in _RunMethod
http_request, client=self.client)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 908, in InitializeUpload
return self.StreamInChunks()
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 1020, in StreamInChunks
additional_headers=additional_headers)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 971, in __StreamMedia
self.RefreshResumableUploadState()
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 875, in RefreshResumableUploadState
raise exceptions.HttpError.FromResponse(refresh_response)
apitools.base.py.exceptions.HttpError: HttpError accessing <https://www.googleapis.com/resumable/upload/storage/v1/b/oee-ebce-platform/o?alt=json&name=tmp%2Fetl-ebce-combine-all-traces-20191127-152244.1574868164.604684%2Fdax-tmp-2019-11-27_07_24_36-17060579636924315582-S02-0-e425da41c3fe2598%2Ftmp-e425da41c3fe2d8b-shard--try-33835bf582552bbd-endshard.avro&uploadType=resumable&upload_id=AEnB2UqddXXpTnnRQyxBQuL1ptXExVZ5CrUQ33o2S2UHcVUhesrBq7XFSQ90YBQznRm2Wh3g8g8lG1z5uEQv8fXvqO40z5WrnQ>: response: <{'x-guploader-uploadid': 'AEnB2UqddXXpTnnRQyxBQuL1ptXExVZ5CrUQ33o2S2UHcVUhesrBq7XFSQ90YBQznRm2Wh3g8g8lG1z5uEQv8fXvqO40z5WrnQ', 'vary': 'Origin, X-Origin', 'content-type': 'application/json; charset=UTF-8', 'content-length': '177', 'date': 'Wed, 27 Nov 2019 15:30:50 GMT', 'server': 'UploadServer', 'status': '410'}>, content <{
"error": {
"errors": [
{
"domain": "global",
"reason": "backendError",
"message": "Backend Error"
}
],
"code": 503,
"message": "Backend Error"
}
}
有什么想法吗?职位ID 2019-11-27_09_50_34-1251118406325466877,如果Google上的任何人正在阅读本文.谢谢.
Any ideas? Job ID 2019-11-27_09_50_34-1251118406325466877, if anyone at Google is reading this. Thanks.
推荐答案
此处提供Google Cloud支持.我检查了您的工作,发现两个内部问题,可能与失败有关.正如Alex Amato在评论中所建议的那样,我会尝试使用
Google Cloud Support here. I have inspected your job and I found two internal issues which might be related to this failure. As suggested by Alex Amato in his comment, I would try to use
--experiments=use_beam_bq_sink
否则,我建议您直接在GCP上打开票证,因为这可能需要进一步调查.
Otherwise, I would recommend you to open a ticket on GCP directly because this might take further investigation.
我希望能帮上忙.
这篇关于数据流作业在BigQuery写入时失败,并出现后端错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!