Google App Engine:如何将大文件写入Google云端存储 [英] Google App Engine: How to write large files to Google Cloud Storage

查看:105
本文介绍了Google App Engine:如何将大文件写入Google云端存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将大型文件从Google App Engine的Blobstore保存到Google云端存储中以便于备份。



对于小文件(<10 MB)但对于较大的文件,它会变得不稳定,GAE会抛出并导致FileNotOpenedError。



我的代码:

  PATH ='/ gs / backupbucket / '
for DocumentFile.all()中的df:
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create (self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-private')
with files.open(write_path,'a')as fp:
而真:
buf = br.read(100000)
if buf ==:break
fp.write(buf)
files.finalize(write_path )

(以避免超出执行时间的流畅度运行)。



抛出一个FileNotOpenedError:

 
Traceback(最近一次调用最后一次):
文件/ base / python27_runtime / python27_lib / versions / third_party / webapp2-2.3 / webapp2.py,第1511行,在__call__
rv = self.handle_exception(request,response,e)
文件/ base / python27_runtime / python27_lib /versions/third_party/webapp2-2.3/webapp2.py,行1505,in __call__
rv = self.router.dispatch(request,response)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1253行,in default_dispatcher
返回route.handler_adapter(请求,响应)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1077行,在__call__
返回handler.dispatch()
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第547行,发送
返回self.handle_exception(e,self.app .debug)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第545行,发送
返回方法(* args,** kwargs)
文件/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py,第249行,在后
fp.write(buf)
文件/ base / python27_runtime / python27_lib / versions / 1 / google / appengine / api / files / file.py,第281行,在__exit__
self.close()
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第275行,关闭
self._make_rpc_call_with_retry('Close ',request,response)
在_make_rpc_call_with_retry
文件中的/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第388行_make_call(方法,请求,响应)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第236行,在_make_call中
_raise_app_error(e)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第179行,在_raise_app_error中
提高FileNotOpenedError()

我进一步调查并根据 GAE Issue 5371 文件API每隔30秒关闭文件。我还没有看到其他地方记录的这一点。



我试着通过间隔关闭和打开文件来解决这个问题,但现在我得到了一个WrongOpenModeError。下面的代码是从这篇文章的第一个版本编辑的,我在文件的关闭和打开之间添加了0.5秒的暂停。它现在抛出一个WrongOpenModeError。



我的代码(已更新):

  DocumentFile.all()中的PATH ='/ gs / backupbucket /'
for df:
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-private')
fp = files.open (write_path,'a')
c = 0
而True:
if(c == 5):
c = 0
fp.close()
文件.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path,'a')
c = c + 1
buf = br.read (100000)
if buf ==:break
fp.write(buf)
files.finalize(write_path)
$ b $ p

Stacktrace:

 
追踪(最近最后一次调用):
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1511行,在__call__
中rv = self.handle_exception(request,response,e)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,行1505,在__call__
rv = self .router.dispatch(request,response)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1253行,在default_dispatcher
中返回route.handler_adapter(request ,响应)
文件/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第1077行,在__call__
return handler.dispatch()
文件 /base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第547行,发送
返回self.handle_exception(e,self.app.debug)
文件/ base /python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py,第545行,发送
返回方法(* args,** kwargs)
文件/ base / data / home / apps / s〜simplerepository / 1.354894420907462278 / processFiles.py,第267行,g et
fp.write(buf)
写入
的文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第310行self._make_rpc_call_with_retry('Append',request,response)
在_make_rpc_call_with_retry $ b $中的文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第388行b _make_call(方法,请求,响应)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,行236,位于_make_call
中_raise_app_error( e)
文件/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py,第188行,在_raise_app_error中
引发WrongOpenModeError()

我试图找到有关WrongOpenModeError的信息,但唯一提及的地方是appengine.api.files.file.py本身。



有关如何解决此问题并能够将大文件保存到Google Cloud存储设备的建议w将不胜感激。谢谢!

解决方案

我遇到同样的问题,最终编写一个迭代器来获取数据并捕获异常,



重新编写代码将如下所示:

  from google.appengine.ext从google.appengine.api导入blobstore 
导入文件

def iter_blobstore(blob,fetch_size = 524288):
start_index = 0
end_index = fetch_size

而真:
读= blobstore.fetch_data(blob,start_index,end_index)

如果读取==:
break

start_index + = fetch_size
end_index + = fetch_size

yield read


PATH ='/ gs / ():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs .create(self.PATH + fn.encode('utf-8'),mime_type ='application / zip',acl ='project-privat e')
with files.open(write_path,'a')as fp:
for itf_blobstore(df.blob)中的buf:
try:
fp.write(buf )
除files.FileNotOpenedError:
传递
files.finalize(write_path)


I am trying to save large files from Google App Engine's Blobstore to Google Cloud Storage to facilitate backup.

It works fine for small files (<10 mb) but for larger files it get gets unstable and GAE throws and FileNotOpenedError.

My code:

PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  with files.open(write_path, 'a') as fp:
    while True:
      buf = br.read(100000)
      if buf=="": break
      fp.write(buf)
  files.finalize(write_path)

(Runs in a taskeque to avoid exceeding execution time).

Throws a FileNotOpenedError:

Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post
    fp.write(buf)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
    self.close()
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
    raise FileNotOpenedError()

I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. I have not seen this documented anywhere else.

I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. It now throws a WrongOpenModeError.

My code (updated):

PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  fp = files.open(write_path, 'a')
  c = 0
  while True:       
    if (c == 5):
      c = 0
      fp.close()
      files.finalize(write_path)
      time.sleep(0.5)
      fp = files.open(write_path, 'a')
    c = c + 1
    buf = br.read(100000)
    if buf=="": break
    fp.write(buf)
  files.finalize(write_path)

Stacktrace:

Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py", line 267, in get
    fp.write(buf)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 310, in write
    self._make_rpc_call_with_retry('Append', request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 188, in _raise_app_error
    raise WrongOpenModeError()

I have tried to find information about the WrongOpenModeError but the only place it is mentioned is in the appengine.api.files.file.py itself.

Suggestions on how to get around this and be able to save also large files to Google Cloud storage would be greatly appreciated. Thanks!

解决方案

I was having the same issue, endup writing an iterator around fetch data and catch the exception, works but is a work-around.

Re-writing your code would be something like:

from google.appengine.ext import blobstore
from google.appengine.api import files

def iter_blobstore(blob, fetch_size=524288):
  start_index = 0
  end_index = fetch_size

  while True:
    read = blobstore.fetch_data(blob, start_index, end_index)

    if read == "":
      break

    start_index += fetch_size
    end_index += fetch_size

    yield read


PATH = '/gs/backupbucket/'
for df in DocumentFile.all():           
  fn = df.blob.filename
  br = blobstore.BlobReader(df.blob)
  write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private') 
  with files.open(write_path, 'a') as fp:
    for buf in iter_blobstore(df.blob):
      try:
        fp.write(buf)
      except files.FileNotOpenedError:
        pass
  files.finalize(write_path)

这篇关于Google App Engine:如何将大文件写入Google云端存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆