AppEngine Query.fetch_async不是非常异步? [英] AppEngine Query.fetch_async not very asynchronous?

查看:87
本文介绍了AppEngine Query.fetch_async不是非常异步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过使用query.fetch_async()异步运行多个子查询来减少AppEngine查询的执行时间。然而,与连续运行查询相比,它的收益似乎很小。



以下是一些最小示例代码(用Python),说明了这个问题 - 首先是异步运行:

  def run_parallel(self,repeats):
start = datetime.utcnow()

futures = []
for xrange(0,重复):
q = User.query()
f = q.fetch_async(300,keys_only = True)
期货。开支(f)

期货:
f = ndb.Future.wait_any(期货)
futures.remove(f)
结果= f.get_result ()
delta_secs =(datetime.utcnow() - start).total_seconds()
self.response.out.write(得到%d结果,delta_sec:%f< br> \\\
%(len(results),delta_secs))

然后为相应的序列运行函数: p>

  def run_serial(self,repeats):
start = datetime.utcnow()
for xrange(0,重复):
q = User.query()
结果= q.fetch(300,keys_only = True)
delta_secs =(datetime.utcnow () - start).total_seconds()
self.response.out.write(got%d d results,delta_sec:%f $ b

运行这两个函数的输出每个10次(不在开发服务器上),即以下调用:

  run_parallel(10)
run_serial(10)

如下所示:

运行并行查询...

Got 300结果,delta_sec:0.401090

得到300结果,delta_sec:0.501700

得到300结果,delta_sec:0.596110

得到300结果,delta_sec:0.686120 >
获得300条结果,delta_sec:0.709220

获得300条结果,delta_sec:0.792070

获得300条结果,delta_sec:0.816500

获得300条结果,delta_sec:0.904360

得到300结果,delta_sec:0.993600

得到300结果,delta_sec:1.017320



运行串行查询...

得到300结果,delta_sec :0.114950

得到300结果,delta_sec:0.269010

得到300结果,delta_sec:0.370590

得到300结果,delta_sec:0.472090

得到300结果,delta_sec:0.575130

得到300结果,delta_sec:0.678900

得到300结果,delta_sec:0.782540

得到300结果,delta_sec:0.883960

得到300结果,delta_sec:0.986370

得到300结果,delta_sec:1.086500


因此,并行系列版本大概在同一时间,大约1秒。 Appstat如下,其中前10个查询是并行的,后面的10个是串行的:


从这些统计数据看来,10个第一查询确实是并行运行的,但是与单独的连续查询相比,他们每个人都花费了不成比例的时间。看起来他们可能会阻止某种方式,等待对方完成。



所以我的问题是:我的代码有什么问题运行异步查询?或者,在AppEngine的异步查询效率方面是否存在内在限制?



我想知道这种行为是否可能是由以下某种原因造成的:


  1. 在同一个实体类型上运行异步查询。但是,使用多个不同实体类型的类似示例显示了类似的结果。

  2. 运行相同的查询,以某种方式锁定索引的各个部分。然而,类似的例子中,每个查询不同(返回不相交的结果集)会产生类似的结果。

所以,我在有点损失。任何建议,将不胜感激。

更新1



按照Bruyere的建议I已经尝试过使用db而不是ndb,并且我尝试交换并行和串行版本的顺序。结果是一样的。

更新2



有同样的问题;仍然没有答案,为什么并行查询如此低效:



查询大数目ndb实体从数据存储 Update 3



使用Java SDK的相应代码非常整齐地并行化。这里是Java appstats:





准确地说,这个Java实现是明确的多线程,在单独的线程中运行查询;这是必要的,因为与AppEngine文档声明相反,使用查询迭代器并不会真正导致查询并行执行。



I我们试图在Python版本中对同步查询调用使用显式多线程,但与原始Python版本的结果相同。



事实上,Java版本按预期执行意味着可怜的Python异步性能不是由AppEngine CPU瓶颈引起的。



我能想到的唯一替代解释是Python的Global Interpreter Lock正在引起颠簸。这是由减少GIL检查间隔(使用sys.setcheckinterval)加剧了可怜的异步性能的事实所支持的。

这令人惊讶:GIL不应该因为查询是IO界限而产生如此严重的影响。我推测可能是RPC输入缓冲区足够小,以至于在检索结果期间异步调用会频繁恢复,这可能会导致GIL抖动。我查看了Python AppEngine库代码,但低级RPC调用是由_apphosting_runtime ___ python__apiproxy.MakeCall()生成的,它似乎是封闭源代码。



唉,我的结论是,Python AppEngine运行时不适合我需要的那种并行查询,除了移动到Java运行时之外,没有别的选择。我真的想避免这种情况,所以我真的希望我错了,错过了一些明显的东西。任何建议或指针将不胜感激。



谢谢!

解决方案

主要的问题是你的例子大多是CPU绑定的,而不是IO绑定。特别是,大部分时间可能用于解码由于GIL而导致python不能高效完成的RPC结果。 Appstats的一个问题是它测量RPC发送时到调用get_result()时的RPC时间。这意味着调用get_result之前花费的时间似乎来自RPC。



如果您发出IO绑定的RPC(即使数据存储工作更困难的查询)你将开始看到并行查询的性能增益。

I'm trying to reduce the execution time of an AppEngine query by running multiple sub-queries asynchronously, using query.fetch_async(). However, it seems that the gain is minimal compared to running the queries serially.

Below is some minimal sample code (in Python) illustrating the problem - first a function for the asynchronous run:

def run_parallel(self, repeats):
    start = datetime.utcnow()

    futures = []
    for i in xrange(0, repeats):
        q = User.query()
        f = q.fetch_async(300, keys_only=True)
        futures.append(f)

    while futures:
        f = ndb.Future.wait_any(futures)
        futures.remove(f)
        results = f.get_result()
        delta_secs = (datetime.utcnow() - start).total_seconds()
        self.response.out.write("Got %d results, delta_sec: %f<br>\n" %(len(results), delta_secs))

Then a function for the corresponding serial run:

def run_serial(self, repeats):
    start = datetime.utcnow()
    for i in xrange(0, repeats):
        q = User.query()
        results = q.fetch(300, keys_only=True)
        delta_secs = (datetime.utcnow() - start).total_seconds()
        self.response.out.write("Got %d results, delta_sec: %f<br>\n" %(len(results), delta_secs))

The output of running these two functions 10 times each (not on the dev-server), i.e. of the following calls:

run_parallel(10)
run_serial(10)

is as follows:

Running parallel queries...
Got 300 results, delta_sec: 0.401090
Got 300 results, delta_sec: 0.501700
Got 300 results, delta_sec: 0.596110
Got 300 results, delta_sec: 0.686120
Got 300 results, delta_sec: 0.709220
Got 300 results, delta_sec: 0.792070
Got 300 results, delta_sec: 0.816500
Got 300 results, delta_sec: 0.904360
Got 300 results, delta_sec: 0.993600
Got 300 results, delta_sec: 1.017320

Running serial queries...
Got 300 results, delta_sec: 0.114950
Got 300 results, delta_sec: 0.269010
Got 300 results, delta_sec: 0.370590
Got 300 results, delta_sec: 0.472090
Got 300 results, delta_sec: 0.575130
Got 300 results, delta_sec: 0.678900
Got 300 results, delta_sec: 0.782540
Got 300 results, delta_sec: 0.883960
Got 300 results, delta_sec: 0.986370
Got 300 results, delta_sec: 1.086500

Hence the parallel and serial versions take roughly the same time, around 1 second. The Appstat are as follows, where the first 10 queries are the parallel ones and the following 10 are the serial ones:

From these stats it looks like the 10 first queries are indeed running in parallel, but that they each are taking a disproportional amount of time compared to the individual serial queries. It looks like they may be blocking somehow, waiting for each other to complete.

So my question: Is there anything wrong with my code for running asynchronous queries? Or is there an inherent limitation in the efficiency of asynchronous queries on AppEngine?

I wondered if the behaviour could be caused by one of the following:

  1. Running asynchronous queries on the same entity type. However, a similar example using multiple different entity types shows similar results.
  2. Running identical queries, somehow locking sections of the index. However, a similar example in which each query is different (returning disjoint result sets) yields similar results.

So, I'm at a bit of a loss. Any suggestions would be greatly appreciated.

Update 1

Following Bruyere's suggestion I've tried using db rather than ndb, and I've tried swapping the order of the parallel and serial versions. The results are the same.

Update 2

Here's a related post concerned with the same issue; still no answer as to why parallel queries are so inefficient:

Best practice to query large number of ndb entities from datastore

Update 3

The corresponding code using the Java SDK is parallelised very neatly. Here are the Java appstats:

To be precise, this Java implementation is explicitly multi-threaded, running queries in separate threads; this is necessary because, contrary to what the AppEngine documentation claims, using query iterators does not actually result in queries being executed in parallel.

I've tried to use explicit multi-threading with synchronous query calls in the Python version, but with the same poor results as the original Python version.

The fact that the Java version performs as expected implies that the poor Python async performance is not caused by an AppEngine CPU bottleneck.

The only alternative explanation that I can think of is that Python's Global Interpreter Lock is causing thrashing. This is supported by the fact that decreasing the GIL check interval (using sys.setcheckinterval) exasperates the poor async performance.

This is surprising though: the GIL shouldn't have such a severe impact given that queries are IO bound. I speculate that perhaps the RPC input buffers are small enough that async calls resume frequently during retrieval of results, which could perhaps cause GIL thrashing. I've had a look at the Python AppEngine library code, but the low-level RPC calls are made by _apphosting_runtime___python__apiproxy.MakeCall() which seems to be closed-source.

Alas, my conclusion is that the Python AppEngine runtime is not suited for the kind of parallel querying that I require, leaving me with no other option than moving to the Java runtime. I would really like to avoid this, and so I really hope that I'm wrong and have missed something obvious. Any suggestions or pointers would be greatly appreciated.

Thanks!

解决方案

The main problem is that your example is mostly CPU-bound as opposed to IO-bound. In particular, most of the time is likely spent in decoding RPC results which isn't done efficiently in python due to the GIL. One of the problems with Appstats is that it measures RPC timing from when the RPC is sent to when get_result() is called. This means that time spent before get_result is called will appear to be coming from the RPCs.

If you instead issue IO-bound RPCs (i.e. queries that make the Datastore work harder) you will start to see the performance gains of parallel queries.

这篇关于AppEngine Query.fetch_async不是非常异步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆