数据存储获取VS提取(keys_only = True),然后get_multi [英] Datastore fetch VS fetch(keys_only=True) then get_multi
问题描述
我使用以下查询从数据存储中获取多个实体100+。
$ b
return entity .query( ancestor = ancestorKey )。filter( entity .year = myStartYear )。order( entity .num) ()
加载需要很长时间(的顺序) p>
试图找到一个最佳的方法,我创建了100个实体,发现它需要750ms〜1000ms之间的任何地方来获取本地服务器上的100个实体,这当然是很多的。我不知道如何绕过单行获取来提高效率!
在试图优化的尝试中,我尝试了
- 删除订单部分,仍然得到相同的结果
- 删除过滤器部分,仍然有相同的结果
- 删除订单& 过滤器零件,仍然得到相同的结果
<所以显然这是别的。在一次绝望的尝试中,我试图获取密钥,然后将密钥传递给 ndb.get_multi()函数:
qKeys = entity .query(ancestor = ancestorKey )。filter( entity .year = myStartYear )。 order( entity .num).fetch(keys_only = True)
return ndb.get_multi(qKeys)
令我惊讶的是,我获得了更好的吞吐量!查询结果现在在450〜550ms之间加载,平均约有40%的性能更好!
我不确定为什么会发生这种情况,我会认为获取函数已经在最佳时间查询实体。
问题:
任何想法如何优化单个查询行以加载更快?
任何人都知道提取函数的底层机制是什么,为什么只提取密钥,然后使用ndb.get_multi()会更快? FWIW,你不应该期望从使用开发服务器或数据存储模拟器的本地执行的数据存储性能测试获得有意义的结果 - 它们只是模拟器,它们与真正的数据存储没有相同的性能(或者甚至是100%的等价功能)。
值得信赖的是@snakecharmerb,确定了罪魁祸首,经OP确认: b
$ b
请注意,云中的性能特征可能与本地计算机上的
不同。你真的想在云中运行这些测试
。 - snakecharmerb昨天
@snakecharmerb你对你的建议是正确的!只需在
云上进行测试,实际上就是在
性能方面的另一种方式。抓取()〜550ms,抓取(按键)然后get_multi是〜700ms
似乎fetch()在云上效果更好! - 昨天哈立德
I am fetching multiple entities 100+ from datastore using the below Query
return entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch()
Which was taking a long time (order of a few seconds) to load.
Trying to find an optimum way, I created exactly 100 entities, found that it takes anywhere between 750ms ~ 1000ms to fetch the 100 entities on local server, which is a lot of course. I am not sure how to get around a single line fetch to make it more efficient!
In a desperate attempt to optimize, I tried
- Removing the order part, still got the same results
- Removing the filter part, still got the same results
- Removing the order & filter part, still got the same results
So apparently it is something else. In a desperate attempt, I tried fetching for keys only then passing the keys to ndb.get_multi() function:
qKeys = entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch(keys_only=True)
return ndb.get_multi(qKeys)
To my surprise I get a better throughput! query results now loads in 450 ~ 550ms which is around ~40% better performance on average!
I am not sure why this happens, I would have thought that the fetch function already queries entities in the most optimum time.
Question: Any idea how I can optimize the single query line to load faster?
Side Question: Anyone knows what's the underlying mechanism for the fetch function, and why fetching keys only, then using ndb.get_multi() is faster?
FWIW, you shouldn't expect meaningful results from datastore performance tests performed locally, using either the development server or the datastore emulator - they're just emulators, they don't have the same performance (or even the 100% equivalent functionality) as the real datastore.
Credit goes to @snakecharmerb, who correctly identified the culprit, confirmed by OP:
Be aware that performance characteristics in the cloud may differ from those on your local machine. You really want to be running these tests in the cloud. – snakecharmerb yesterday
@snakecharmerb you were right on your suggestion! Just tested on the cloud it's actually the other way around on the cloud in terms of performance. fetch() ~550ms, fetch(keysonly) then get_multi was ~700ms seems that fetch() works better on the cloud! – Khaled yesterday
这篇关于数据存储获取VS提取(keys_only = True),然后get_multi的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!