Google App Engine和Google表格超出了软限制 [英] Google App Engine and Google Sheets exceeding soft memory limit

查看:266
本文介绍了Google App Engine和Google表格超出了软限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个简单的服务,以从多个来源获取数据,将它们合并到一起,然后使用Google API客户端将其发送给Google表单。简单的peasy工作良好,数据不是那么大。
$ b

问题是在构建api服务之后调用.spreadsheets()(即 build ('sheets','v4',http = auth).spreadsheets())导致大约30兆字节的内存跳转(我做了一些分析以分离内存分配的位置)。当部署到GAE时,这些峰值会持续很长一段时间(有时一次有几个小时),并向上蔓延,并在几次请求触发GAE的超出软限制内存限制错误。

我使用memcache发现文档和urlfetch获取数据,但这些是我使用的唯一其他服务。



我尝试了手动垃圾回收,改变app.yaml中的线程安全,甚至是改变调用.spreadsheets()的点,并且不能动摇这个问题。我也可能误解GAE的体系结构,但我知道这个高峰是由对.spreadsheets()的调用引起的,我并没有在本地缓存中存储任何东西。



有没有一种方法可以1)通过调用.spreadsheets()来减少内存峰值的大小或2)使尖峰停留在内存中(或者最好是两者兼有)。下面给出一个非常简化的要点,以说明API调用和请求处理程序的概念,如果需要,我可以提供更完整的代码。我知道以前有类似的问题,但我无法修正。

https://gist.github.com/chill17/18f1caa897e6a20201232165aca05239

解决方案

我在一个只有20MB可用RAM的小型处理器上使用电子表格API时遇到了这个问题。问题是谷歌API客户端以字符串格式提取整个API并将其作为资源对象存储在内存中。

如果空闲内存有问题,您应该构建自己的http对象并手动发出所需的请求。

  SCOPES ='https:/ /www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE ='client_secret.json'
APPLICATION_NAME ='Google表格API Python Quickstart'

类别电子表格:

def __init __(self,title):

#从本地存储的JSON文件获取凭证
#如果文件不存在,创建它
self.credentials = self .getCredentials()

将用于推/拉数据的HTTP服务

self.service = httplib2.Http()
self.service = self .credentials.authorize(self.service)
self.headers = {'content-type':'application / json','accept-encoding':'gzip,deflate','accept':'application / json ','user-agent':'google-api-python-client / 1.6.2(gzip)'}


print(CREDENTIALS:+ str(self.credentials))


self.baseUrl =https://sheets.googleapis.com/v4/spreadsheets
self.spreadsheetInfo = self.create( title)
self.spreadsheetId = self.spreadsheetInfo ['spreadsheetId']


$ b $ get getCredentials(self):
获取有效用户来自存储的凭证。

如果没有存储任何内容,或者存储的凭证无效
,则完成OAuth2流程以获取新凭证。

返回:
凭证,获得的凭证。

home_dir = os.path.expanduser('〜')
credential_dir = os.path.join(home_dir,'.credentials')
如果不是操作系统。 path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'sheets.googleapis.com-python-quickstart.json')

store =存储(credential_path)
凭证= store.get()
如果不是凭证或credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE,SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow,store,flags)
else:#仅用于兼容Python 2.6
credentials = tools.run(流程,存储)
print('将凭据存储到'+ credential_path)
返回凭证

def create(self,title):

#仅在请求主体中放置标题...现在我们不需要任何其他
requestBody = {
properties:{
title:title
},
}


print(BODY:+ str (requestBody))
url = self.baseUrl

response,content = self.service.request(url,
method =POST,
headers = self .headers,
body = str(requestBody))
print(\\\
\\\
RESPONSE\\\
+ str(response))
print(\\\
\\\
CONTENT \\\
+ str(content))

return json.loads(content)


I'm writing a simple service to take data from a couple of sources, munge it together, and use the Google API client to send it to a Google Sheet. Easy peasy works good, the data is not that big.

The issue is that calling .spreadsheets() after building the api service (i.e. build('sheets', 'v4', http=auth).spreadsheets()) causes a memory jump of roughly 30 megabytes (I did some profiling to separate out where the memory was being allocated). When deployed to GAE, these spikes stick around for long stretches of time (hours at a time sometimes), creeping upwards and after several requests trigger GAE's 'Exceeded soft private memory limit' error.

I am using memcache for the discovery document and urlfetch for grabbing data, but those are the only other services I am using.

I have tried manual garbage collection, changing threadsafe in app.yaml, even things like changing the point at which .spreadsheets() is called, and can't shake this problem. It's also possible that I am simply misunderstanding something about GAE's architecture, but I know the spike is caused by the call to .spreadsheets() and I am not storing anything in local caches.

Is there a way either to 1) reduce the size of the memory spike from calling .spreadsheets() or 2) keep the spikes from staying around in memory (or preferably do both). A very simplified gist is below to give an idea of the API calls and request handler, I can give fuller code if needed. I know similar questions have been asked before, but I can't get it fixed.

https://gist.github.com/chill17/18f1caa897e6a20201232165aca05239

解决方案

I ran into this when using the spreadsheets API on a small processor with only 20MB of usable RAM. The problem is the google API client pulls in the whole API in string format and stores it as a resource object in memory.

If free memory is an issue, you should construct your own http object and manually make the desired request. See my Spreadsheet() class as an example of how to create a new spreadsheet using this method.

SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'

class Spreadsheet:

    def __init__(self, title):

        #Get credentials from locally stored JSON file
        #If file does not exist, create it
        self.credentials = self.getCredentials()

        #HTTP service that will be used to push/pull data

        self.service = httplib2.Http()
        self.service = self.credentials.authorize(self.service)
        self.headers = {'content-type': 'application/json', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'user-agent': 'google-api-python-client/1.6.2 (gzip)'}        


        print("CREDENTIALS: "+str(self.credentials))


        self.baseUrl = "https://sheets.googleapis.com/v4/spreadsheets"
        self.spreadsheetInfo = self.create(title)   
        self.spreadsheetId = self.spreadsheetInfo['spreadsheetId']    



    def getCredentials(self):
        """Gets valid user credentials from storage.

        If nothing has been stored, or if the stored credentials are invalid,
        the OAuth2 flow is completed to obtain the new credentials.

        Returns:
            Credentials, the obtained credential.
        """
        home_dir = os.path.expanduser('~')
        credential_dir = os.path.join(home_dir, '.credentials')
        if not os.path.exists(credential_dir):
            os.makedirs(credential_dir)
        credential_path = os.path.join(credential_dir,
                                       'sheets.googleapis.com-python-quickstart.json')

        store = Storage(credential_path)
        credentials = store.get()
        if not credentials or credentials.invalid:
            flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
            flow.user_agent = APPLICATION_NAME
            if flags:
                credentials = tools.run_flow(flow, store, flags)
            else: # Needed only for compatibility with Python 2.6
                credentials = tools.run(flow, store)
            print('Storing credentials to ' + credential_path)
        return credentials

    def create(self, title):

        #Only put title in request body... We don't need anything else for now
        requestBody = {
            "properties":{
                "title":title
            },
        }


        print("BODY: "+str(requestBody))
        url = self.baseUrl

        response, content = self.service.request(url, 
                                        method="POST", 
                                        headers=self.headers,
                                        body=str(requestBody))
        print("\n\nRESPONSE\n"+str(response))
        print("\n\nCONTENT\n"+str(content))

        return json.loads(content)

这篇关于Google App Engine和Google表格超出了软限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆