谷歌应用引擎 NDB 记录来自 NDB 模型的计数 [英] google app engine NDB records counts from NDB model
问题描述
我们可以从单个查询中从谷歌应用引擎获取多少条记录,以便我们可以向用户显示计数,并且我们可以将超时限制从 3 秒增加到 5 秒
How many records we can get from google app engine from single query so that we can display count to user and is we can increase timeout limit 3 seconds to 5 seconds
推荐答案
根据我的经验,ndb 一次不能拉取超过 1000 条记录.这是一个示例,说明如果我尝试在包含约 500,000 条记录的表上使用 .count()
会发生什么.
In my experience, ndb cannot pull more than 1000 records at a time. Here is an example of what happens if I try to use .count()
on a table that contains ~500,000 records.
s~project-id> models.Transaction.query().count()
WARNING:root:suspended generator _count_async(query.py:1330) raised AssertionError()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/utils.py", line 160, in positional_wrapper
return wrapped(*args, **kwds)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1287, in count
return self.count_async(limit, **q_options).get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1330, in _count_async
batch = yield rpc
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
result = rpc.get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 614, in get_result
return self.__get_result_hook(self)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_query.py", line 2910, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1377, in check_rpc_success
rpc.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 580, in check_success
self.__rpc.CheckSuccess()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
self.request, self.response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 308, in MakeSyncCall
handler(request, response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 362, in _Dynamic_Next
assert next_request.offset() == 0
AssertionError
要绕过这个,您可以执行以下操作:
To by pass this, you can do something like:
objs = []
q = None
more = True
while more:
_objs, q, more = models.Transaction.query().fetch_page(300, start_cursor=q)
objs.extend(_objs)
但即便如此,最终也会达到内存/超时限制.
But even that will eventually hit memory/timeout limits.
目前我使用 Google Dataflow 预先计算这些值并将结果存储在 Datastore 中作为模型 DaySummaries
&StatsPerUser
Currently I use Google Dataflow to pre-compute these values and store the results in Datastore as the models DaySummaries
& StatsPerUser
snakecharmerb
是正确的.我能够在生产环境中使用 .count()
,但是它必须计算的实体越多,它似乎花费的时间就越长.这是我的日志查看器的屏幕截图,其中花费了大约 15 秒来计算大约 330,000 条记录
snakecharmerb
is correct. I was able to use .count()
in the production environment, but the more entities it has to count, the longer it seems to take. Here's a screenshot of my logs viewer where it took ~15 seconds to count ~330,000 records
当我尝试向返回约 4500 计数的查询添加过滤器时,它运行了大约一秒钟.
When I tried adding a filter to that query which returned a count of ~4500, it took about a second to run instead.
好的,我有另一个应用引擎项目,其中包含大约 8,000,000 条记录.我试图在我的 http 请求处理程序中执行 .count()
并且请求在运行 60 秒后超时.
Ok I had another app engine project with a kind with ~8,000,000 records. I tried to do .count()
on that in my http request handler and the request timed-out after running for 60 seconds.
这篇关于谷歌应用引擎 NDB 记录来自 NDB 模型的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!