AppEngine Query.fetch_async 不是很异步吗? [英] AppEngine Query.fetch_async not very asynchronous?

查看:10
本文介绍了AppEngine Query.fetch_async 不是很异步吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过使用 query.fetch_async() 异步运行多个子查询来减少 AppEngine 查询的执行时间.然而,与串行运行查询相比,收益似乎微乎其微.

I'm trying to reduce the execution time of an AppEngine query by running multiple sub-queries asynchronously, using query.fetch_async(). However, it seems that the gain is minimal compared to running the queries serially.

下面是一些说明问题的最小示例代码(在 Python 中) - 首先是异步运行的函数:

Below is some minimal sample code (in Python) illustrating the problem - first a function for the asynchronous run:

def run_parallel(self, repeats):
    start = datetime.utcnow()

    futures = []
    for i in xrange(0, repeats):
        q = User.query()
        f = q.fetch_async(300, keys_only=True)
        futures.append(f)

    while futures:
        f = ndb.Future.wait_any(futures)
        futures.remove(f)
        results = f.get_result()
        delta_secs = (datetime.utcnow() - start).total_seconds()
        self.response.out.write("Got %d results, delta_sec: %f<br>
" %(len(results), delta_secs))

然后是对应串口运行的函数:

Then a function for the corresponding serial run:

def run_serial(self, repeats):
    start = datetime.utcnow()
    for i in xrange(0, repeats):
        q = User.query()
        results = q.fetch(300, keys_only=True)
        delta_secs = (datetime.utcnow() - start).total_seconds()
        self.response.out.write("Got %d results, delta_sec: %f<br>
" %(len(results), delta_secs))

这两个函数各运行 10 次的输出(不在开发服务器上),即以下调用:

The output of running these two functions 10 times each (not on the dev-server), i.e. of the following calls:

run_parallel(10)
run_serial(10)

如下:

运行并行查询...
得到 300 个结果,delta_sec:0.401090
得到 300 个结果,delta_sec:0.501700
得到 300 个结果,delta_sec:0.596110
得到 300 个结果,delta_sec:0.686120
得到 300 个结果,delta_sec:0.709220
得到 300 个结果,delta_sec:0.792070
得到 300 个结果,delta_sec:0.816500
得到 300 个结果,delta_sec:0.904360得到 300 个结果,delta_sec:0.993600
得到 300 个结果,delta_sec:1.017320

运行串行查询...
得到 300 个结果,delta_sec:0.114950
得到 300 个结果,delta_sec:0.269010
得到 300 个结果,delta_sec:0.370590
得到 300 个结果,delta_sec:0.472090
得到 300 个结果,delta_sec:0.575130
得到 300 个结果,delta_sec:0.678900
得到 300 个结果,delta_sec:0.782540
得到 300 个结果,delta_sec:0.883960
得到 300 个结果,delta_sec:0.986370
得到 300 个结果,delta_sec:1.086500

Running parallel queries...
Got 300 results, delta_sec: 0.401090
Got 300 results, delta_sec: 0.501700
Got 300 results, delta_sec: 0.596110
Got 300 results, delta_sec: 0.686120
Got 300 results, delta_sec: 0.709220
Got 300 results, delta_sec: 0.792070
Got 300 results, delta_sec: 0.816500
Got 300 results, delta_sec: 0.904360
Got 300 results, delta_sec: 0.993600
Got 300 results, delta_sec: 1.017320

Running serial queries...
Got 300 results, delta_sec: 0.114950
Got 300 results, delta_sec: 0.269010
Got 300 results, delta_sec: 0.370590
Got 300 results, delta_sec: 0.472090
Got 300 results, delta_sec: 0.575130
Got 300 results, delta_sec: 0.678900
Got 300 results, delta_sec: 0.782540
Got 300 results, delta_sec: 0.883960
Got 300 results, delta_sec: 0.986370
Got 300 results, delta_sec: 1.086500

因此并行和串行版本的时间大致相同,大约 1 秒.Appstat 如下,其中前 10 个查询是并行查询,后 10 个查询是串行查询:

Hence the parallel and serial versions take roughly the same time, around 1 second. The Appstat are as follows, where the first 10 queries are the parallel ones and the following 10 are the serial ones:

从这些统计数据来看,前 10 个查询确实是并行运行的,但与单个串行查询相比,它们每个都花费了不成比例的时间.看起来他们可能以某种方式阻塞,等待对方完成.

From these stats it looks like the 10 first queries are indeed running in parallel, but that they each are taking a disproportional amount of time compared to the individual serial queries. It looks like they may be blocking somehow, waiting for each other to complete.

所以我的问题是:我运行异步查询的代码有什么问题吗?还是 AppEngine 上异步查询的效率存在固有限制?

So my question: Is there anything wrong with my code for running asynchronous queries? Or is there an inherent limitation in the efficiency of asynchronous queries on AppEngine?

我想知道该行为是否可能是由以下原因之一引起的:

I wondered if the behaviour could be caused by one of the following:

  1. 对同一实体类型运行异步查询.但是,使用多个不同实体类型的类似示例显示了类似的结果.
  2. 运行相同的查询,以某种方式锁定索引的部分.但是,每个查询都不同(返回不相交的结果集)的类似示例会产生类似的结果.

所以,我有点不知所措.任何建议将不胜感激.

So, I'm at a bit of a loss. Any suggestions would be greatly appreciated.

更新 1

按照 Bruyere 的建议,我尝试使用 db 而不是 ndb,并且尝试交换并行和串行版本的顺序.结果是一样的.

Following Bruyere's suggestion I've tried using db rather than ndb, and I've tried swapping the order of the parallel and serial versions. The results are the same.

更新 2

这是一个与同一问题有关的相关帖子;仍然没有回答为什么并行查询如此低效:

Here's a related post concerned with the same issue; still no answer as to why parallel queries are so inefficient:

查询大量来自数据存储区的 ndb 实体

更新 3

使用Java SDK的相应代码并行化得非常巧妙.以下是 Java appstats:

The corresponding code using the Java SDK is parallelised very neatly. Here are the Java appstats:

准确地说,这个Java实现是明确的多线程,在单独的线程中运行查询;这是必要的,因为与 AppEngine 文档声称的相反,使用查询迭代器不会实际上导致查询被并行执行.

To be precise, this Java implementation is explicitly multi-threaded, running queries in separate threads; this is necessary because, contrary to what the AppEngine documentation claims, using query iterators does not actually result in queries being executed in parallel.

我尝试在 Python 版本中使用显式多线程和同步查询调用,但结果与原始 Python 版本相同.

I've tried to use explicit multi-threading with synchronous query calls in the Python version, but with the same poor results as the original Python version.

Java 版本按预期执行的事实意味着 Python 异步性能不佳不是由 AppEngine CPU 瓶颈引起的.

The fact that the Java version performs as expected implies that the poor Python async performance is not caused by an AppEngine CPU bottleneck.

我能想到的唯一替代解释是 Python 的全局解释器锁导致了抖动.减少 GIL 检查间隔(使用 sys.setcheckinterval)会加剧较差的异步性能这一事实支持了这一点.

The only alternative explanation that I can think of is that Python's Global Interpreter Lock is causing thrashing. This is supported by the fact that decreasing the GIL check interval (using sys.setcheckinterval) exasperates the poor async performance.

尽管如此,这令人惊讶:鉴于查询是 IO 绑定的,GIL 不应产生如此严重的影响.我推测 RPC 输入缓冲区可能足够小,以至于在检索结果期间异步调用会频繁恢复,这可能会导致 GIL 抖动.我看过 Python AppEngine 库代码,但低级 RPC 调用是由 _apphosting_runtime___python__apiproxy.MakeCall() 进行的,它似乎是闭源的.

This is surprising though: the GIL shouldn't have such a severe impact given that queries are IO bound. I speculate that perhaps the RPC input buffers are small enough that async calls resume frequently during retrieval of results, which could perhaps cause GIL thrashing. I've had a look at the Python AppEngine library code, but the low-level RPC calls are made by _apphosting_runtime___python__apiproxy.MakeCall() which seems to be closed-source.

唉,我的结论是 Python AppEngine 运行时不适合我需要的那种并行查询,这让我别无选择,只能迁移到 Java 运行时.我真的很想避免这种情况,所以我真的希望我错了并且错过了一些明显的东西.任何建议或指示将不胜感激.

Alas, my conclusion is that the Python AppEngine runtime is not suited for the kind of parallel querying that I require, leaving me with no other option than moving to the Java runtime. I would really like to avoid this, and so I really hope that I'm wrong and have missed something obvious. Any suggestions or pointers would be greatly appreciated.

谢谢!

推荐答案

主要问题是您的示例主要受 CPU 限制,而不是受 IO 限制.特别是,大部分时间可能花在解码 RPC 结果上,由于 GIL,这在 Python 中无法有效完成.Appstats 的问题之一是它测量了从发送 RPC 到调用 get_result() 的 RPC 时间.这意味着在调用 get_result 之前花费的时间似乎来自 RPC.

The main problem is that your example is mostly CPU-bound as opposed to IO-bound. In particular, most of the time is likely spent in decoding RPC results which isn't done efficiently in python due to the GIL. One of the problems with Appstats is that it measures RPC timing from when the RPC is sent to when get_result() is called. This means that time spent before get_result is called will appear to be coming from the RPCs.

如果您改为发出 IO-bound RPC(即使数据存储区更难工作的查询),您将开始看到并行查询的性能提升.

If you instead issue IO-bound RPCs (i.e. queries that make the Datastore work harder) you will start to see the performance gains of parallel queries.

这篇关于AppEngine Query.fetch_async 不是很异步吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆