Google App Engine:如何将大文件写入Google云端存储 [英] Google App Engine: How to write large files to Google Cloud Storage
问题描述
我试图将大型文件从Google App Engine的Blobstore保存到Google云端存储中以便于备份。
对于小文件(<10 MB)但对于较大的文件,它会变得不稳定,GAE会抛出并导致FileNotOpenedError。
我的代码:
PATH ='/ gs / backupbucket / '
for DocumentFile.all()中的df:
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create (self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-private')
with files.open(write_path,'a')as fp:
而真:
buf = br.read(100000)
if buf ==:break
fp.write(buf)
files.finalize(write_path )
(以避免超出执行时间的流畅度运行)。
抛出一个FileNotOpenedError:
Traceback(最近一次调用最后一次):
文件/ base / python27_runtime / python27_lib / versions / third_party / webapp2-2.3 / webapp2.py,第1511行,在__call__
rv = self.handle_exception(request,response,e)
文件/ base / python27_runtime / python27_lib /versions/third_party/webapp2-2.3/webapp2.py,行1505,in __call__
rv = self.router.dispatch(request,response)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1253行,in default_dispatcher
返回route.handler_adapter(请求,响应)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1077行,在__call__
返回handler.dispatch()
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第547行,发送
返回self.handle_exception(e,self.app .debug)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第545行,发送
返回方法(* args,** kwargs)
文件/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py,第249行,在后
fp.write(buf)
文件/ base / python27_runtime / python27_lib / versions / 1 / google / appengine / api / files / file.py,第281行,在__exit__
self.close()
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第275行,关闭
self._make_rpc_call_with_retry('Close ',request,response)
在_make_rpc_call_with_retry
文件中的/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第388行_make_call(方法,请求,响应)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第236行,在_make_call中
_raise_app_error(e)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第179行,在_raise_app_error中
提高FileNotOpenedError()
我进一步调查并根据 GAE Issue 5371 文件API每隔30秒关闭文件。我还没有看到其他地方记录的这一点。
我试着通过间隔关闭和打开文件来解决这个问题,但现在我得到了一个WrongOpenModeError。下面的代码是从这篇文章的第一个版本编辑的,我在文件的关闭和打开之间添加了0.5秒的暂停。它现在抛出一个WrongOpenModeError。
我的代码(已更新):
DocumentFile.all()中的PATH ='/ gs / backupbucket /'
for df:
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-private')
fp = files.open (write_path,'a')
c = 0
而True:
if(c == 5):
c = 0
fp.close()
文件.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path,'a')
c = c + 1
buf = br.read (100000)
if buf ==:break
fp.write(buf)
files.finalize(write_path)
$ b $ p Stacktrace:
追踪(最近最后一次调用):
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1511行,在__call__
中rv = self.handle_exception(request,response,e)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,行1505,在__call__
rv = self .router.dispatch(request,response)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1253行,在default_dispatcher
中返回route.handler_adapter(request ,响应)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1077行,在__call__
return handler.dispatch()
文件 /base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第547行,发送
返回self.handle_exception(e,self.app.debug)
文件/ base /python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第545行,发送
返回方法(* args,** kwargs)
文件/ base / data / home / apps / s〜simplerepository / 1.354894420907462278 / processFiles.py,第267行,g et
fp.write(buf)
写入
的文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第310行self._make_rpc_call_with_retry('Append',request,response)
在_make_rpc_call_with_retry $ b $中的文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第388行b _make_call(方法,请求,响应)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,行236,位于_make_call
中_raise_app_error( e)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第188行,在_raise_app_error中
引发WrongOpenModeError()
我试图找到有关WrongOpenModeError的信息,但唯一提及的地方是appengine.api.files.file.py本身。
有关如何解决此问题并能够将大文件保存到Google Cloud存储设备的建议w将不胜感激。谢谢!
我遇到同样的问题,最终编写一个迭代器来获取数据并捕获异常,
重新编写代码将如下所示:
from google.appengine.ext从google.appengine.api导入blobstore
导入文件
def iter_blobstore(blob,fetch_size = 524288):
start_index = 0
end_index = fetch_size
而真:
读= blobstore.fetch_data(blob,start_index,end_index)
如果读取==:
break
start_index + = fetch_size
end_index + = fetch_size
yield read
PATH ='/ gs / ():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs .create(self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-privat e')
with files.open(write_path,'a')as fp:
for itf_blobstore(df.blob)中的buf:
try:
fp.write(buf )
除files.FileNotOpenedError:
传递
files.finalize(write_path)
I am trying to save large files from Google App Engine's Blobstore to Google Cloud Storage to facilitate backup.
It works fine for small files (<10 mb) but for larger files it get gets unstable and GAE throws and FileNotOpenedError.
My code:
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
while True:
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
(Runs in a taskeque to avoid exceeding execution time).
Throws a FileNotOpenedError:
Traceback (most recent call last): File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__ rv = self.handle_exception(request, response, e) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__ rv = self.router.dispatch(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher return route.handler_adapter(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__ return handler.dispatch() File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch return self.handle_exception(e, self.app.debug) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch return method(*args, **kwargs) File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post fp.write(buf) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__ self.close() File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close self._make_rpc_call_with_retry('Close', request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry _make_call(method, request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call _raise_app_error(e) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error raise FileNotOpenedError()
I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. I have not seen this documented anywhere else.
I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. It now throws a WrongOpenModeError.
My code (updated):
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
fp = files.open(write_path, 'a')
c = 0
while True:
if (c == 5):
c = 0
fp.close()
files.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path, 'a')
c = c + 1
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
Stacktrace:
Traceback (most recent call last): File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__ rv = self.handle_exception(request, response, e) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__ rv = self.router.dispatch(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher return route.handler_adapter(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__ return handler.dispatch() File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch return self.handle_exception(e, self.app.debug) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch return method(*args, **kwargs) File "/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py", line 267, in get fp.write(buf) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 310, in write self._make_rpc_call_with_retry('Append', request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry _make_call(method, request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call _raise_app_error(e) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 188, in _raise_app_error raise WrongOpenModeError()
I have tried to find information about the WrongOpenModeError but the only place it is mentioned is in the appengine.api.files.file.py itself.
Suggestions on how to get around this and be able to save also large files to Google Cloud storage would be greatly appreciated. Thanks!
I was having the same issue, endup writing an iterator around fetch data and catch the exception, works but is a work-around.
Re-writing your code would be something like:
from google.appengine.ext import blobstore
from google.appengine.api import files
def iter_blobstore(blob, fetch_size=524288):
start_index = 0
end_index = fetch_size
while True:
read = blobstore.fetch_data(blob, start_index, end_index)
if read == "":
break
start_index += fetch_size
end_index += fetch_size
yield read
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
for buf in iter_blobstore(df.blob):
try:
fp.write(buf)
except files.FileNotOpenedError:
pass
files.finalize(write_path)
这篇关于Google App Engine:如何将大文件写入Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!