无法上传>约2GB的Google Cloud Storage [英] can't upload > ~2GB to Google Cloud Storage

查看:87
本文介绍了无法上传>约2GB的Google Cloud Storage的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面跟踪.

相关的Python代码段:

The relevant Python snippet:

bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)

哪个最终会触发(来自ssl库):

Which ultimately triggers (from the ssl library):

OverflowError:字符串长度超过2147483647字节

OverflowError: string longer than 2147483647 bytes

我认为我缺少一些特殊的配置选项吗?

I assume there is some special configuration option I'm missing?

这可能与这个大约1.5年的老问题似乎仍然有关: https://github.com/googledatalab/datalab/issues/784 .

This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.

帮助表示赞赏!

完整跟踪:

[文件"/usr/src/app/gcloud/download_data.py",第109行,在*******中blob.upload_from_filename(source_path)

[File "/usr/src/app/gcloud/download_data.py", line 109, in ******* blob.upload_from_filename(source_path)

文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",行992,位于upload_from_filename中大小= total_bytes)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename size=total_bytes)

文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第946行,位于upload_from_file中客户,file_obj,content_type,大小,num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file client, file_obj, content_type, size, num_retries)

文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",行867,在_do_upload中客户端,流,content_type,大小,num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload client, stream, content_type, size, num_retries)

_do_multipart_upload中的文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第700行传输,数据,object_metadata,content_type)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload transport, data, object_metadata, content_type)

在传输中,文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py",第97行retry_strategy = self._retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit retry_strategy=self._retry_strategy)

http_request中的文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py",第101行func,RequestsMixin._get_status_code,retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request func, RequestsMixin._get_status_code, retry_strategy)

文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py",第146行,在wait_and_retry中响应= func()

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func()

请求中的文件"/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py",第186行方法,网址,数据=数据,标头= request_headers,**假)

File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request method, url, data=data, headers=request_headers, **kwargs)

请求中的文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第508行resp = self.send(prep,** send_kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)

send中的文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第618行r = adapter.send(request,** kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs)

文件send中的文件"/usr/local/lib/python3.5/dist-packages/requests/adapters.py",第440行timeout =超时

File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send timeout=timeout

在urlopen中的文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第601行chunked = chunked)

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked)

_make_request中的文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第357行conn.request(方法,网址,** httplib_request_kw)

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw)

请求中的文件"/usr/lib/python3.5/http/client.py",行1106self._send_request(方法,URL,正文,标头)

File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers)

_send_request中的文件"/usr/lib/python3.5/http/client.py",行1151self.endheaders(body)

File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body)

文件"/usr/lib/python3.5/http/client.py",第1102行,在标题中self._send_output(message_body)

File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body)

_send_output中的文件"/usr/lib/python3.5/http/client.py",第936行self.send(message_body)

File "/usr/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body)

文件send中的文件"/usr/lib/python3.5/http/client.py",第908行self.sock.sendall(data)

File "/usr/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data)

sendall中的文件"/usr/lib/python3.5/ssl.py",第891行v = self.send(data [count:])

File "/usr/lib/python3.5/ssl.py", line 891, in sendall v = self.send(data[count:])

文件"/usr/lib/python3.5/ssl.py",行861,在发送中返回self._sslobj.write(data)

File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data)

文件"/usr/lib/python3.5/ssl.py",第586行,写入返回self._sslobj.write(data)

File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)

OverflowError:字符串长度超过2147483647字节

OverflowError: string longer than 2147483647 bytes

推荐答案

问题是它正试图将整个文件读取到

The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.

相反,在创建对象时指定 chunk_size 会触发它上载多个部分:

Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:

# Must be a multiple of 256KB per docstring    
CHUNK_SIZE = 10485760  # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)

快乐黑客!

这篇关于无法上传>约2GB的Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆