无法上传>约2GB的Google Cloud Storage [英] can't upload > ~2GB to Google Cloud Storage
问题描述
在下面跟踪.
相关的Python代码段:
The relevant Python snippet:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
哪个最终会触发(来自ssl库):
Which ultimately triggers (from the ssl library):
OverflowError:字符串长度超过2147483647字节
OverflowError: string longer than 2147483647 bytes
我认为我缺少一些特殊的配置选项吗?
I assume there is some special configuration option I'm missing?
这可能与这个大约1.5年的老问题似乎仍然有关: https://github.com/googledatalab/datalab/issues/784 .
This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.
帮助表示赞赏!
完整跟踪:
[文件"/usr/src/app/gcloud/download_data.py",第109行,在*******中blob.upload_from_filename(source_path)
[File "/usr/src/app/gcloud/download_data.py", line 109, in ******* blob.upload_from_filename(source_path)
文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",行992,位于upload_from_filename中大小= total_bytes)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename size=total_bytes)
文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第946行,位于upload_from_file中客户,file_obj,content_type,大小,num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file client, file_obj, content_type, size, num_retries)
文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",行867,在_do_upload中客户端,流,content_type,大小,num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload client, stream, content_type, size, num_retries)
_do_multipart_upload中的文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第700行传输,数据,object_metadata,content_type)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload transport, data, object_metadata, content_type)
在传输中,文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py",第97行retry_strategy = self._retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit retry_strategy=self._retry_strategy)
http_request中的文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py",第101行func,RequestsMixin._get_status_code,retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request func, RequestsMixin._get_status_code, retry_strategy)
文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py",第146行,在wait_and_retry中响应= func()
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func()
请求中的文件"/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py",第186行方法,网址,数据=数据,标头= request_headers,**假)
File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request method, url, data=data, headers=request_headers, **kwargs)
请求中的文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第508行resp = self.send(prep,** send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)
send中的文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第618行r = adapter.send(request,** kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs)
文件send中的文件"/usr/local/lib/python3.5/dist-packages/requests/adapters.py",第440行timeout =超时
File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send timeout=timeout
在urlopen中的文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第601行chunked = chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked)
_make_request中的文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第357行conn.request(方法,网址,** httplib_request_kw)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw)
请求中的文件"/usr/lib/python3.5/http/client.py",行1106self._send_request(方法,URL,正文,标头)
File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers)
_send_request中的文件"/usr/lib/python3.5/http/client.py",行1151self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body)
文件"/usr/lib/python3.5/http/client.py",第1102行,在标题中self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body)
_send_output中的文件"/usr/lib/python3.5/http/client.py",第936行self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body)
文件send中的文件"/usr/lib/python3.5/http/client.py",第908行self.sock.sendall(data)
File "/usr/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data)
sendall中的文件"/usr/lib/python3.5/ssl.py",第891行v = self.send(data [count:])
File "/usr/lib/python3.5/ssl.py", line 891, in sendall v = self.send(data[count:])
文件"/usr/lib/python3.5/ssl.py",行861,在发送中返回self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data)
文件"/usr/lib/python3.5/ssl.py",第586行,写入返回self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)
OverflowError:字符串长度超过2147483647字节
OverflowError: string longer than 2147483647 bytes
推荐答案
The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename
shows that it stats
the file and then passes that in as the upload size as a single upload part.
相反,在创建对象时指定 chunk_size
会触发它上载多个部分:
Instead, specifying a chunk_size
when creating the object will trigger it to upload in multiple parts:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
快乐黑客!
这篇关于无法上传>约2GB的Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!