通过聚合RPC调用来加速GAE-Py中的模板 [英] Speeding up templates in GAE-Py by aggregating RPC calls

查看:147
本文介绍了通过聚合RPC调用来加速GAE-Py中的模板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  class City(Model):
name = StringProperty()

class作者(模型):
name = StringProperty()
city = ReferenceProperty(城市)

class发布(模型):
author = ReferenceProperty (作者)
content = StringProperty()

代码并不重要...它的这个django模板:

  {%发布在帖子%中} 
< div> {{post.content }}< / DIV> $ {
< div> {{post.author.city.name}} {{post.author.name}}< / div>
{%endfor%}

现在让我说我使用<$获得前100个帖子c $ c> Post.all()。fetch(limit = 100),并将这个列表传递给模板 - 会发生什么?



这使得 200多个数据存储获取 - 100个获取每个作者,100个获取每个作者的城市。



实际上,这是完全可以理解的,因为这篇文章只提到了作者,而作者只提到了这个城市。 post.author author.city __ get __ >对象透明地执行获取并将数据拉回(参见这个问题)。



一些解决方法是


  1. 使用 Post.author.get_value_for_datastore(post)来收集作者密钥(参见上面的链接),然后做一个批处理来获取它们 - 这里的麻烦是我们需要重新构建一个模板数据对象......这需要每个模型和处理程序的额外代码和维护。

  2. 编写一个访问器,比如说 cached_author ,它首先为作者检查memcache并返回 - 这里的问题是post.cached_author将被称为100次,这可能意味着100个memcache调用。

  3. 如果数据不必非常保持最新状态,则持有一个静态键来映射对象(并可能在五分钟内刷新一次)。然后, cached_author 访问器可以引用此映射。

所有这些想法都需要额外的代码和维护,而且它们不是很透明。如果我们可以做到这一点,那么我们可以这样做:

$ pre $ def $ render $ template $路径,数据)

原来我们可以... 挂钩 Guido的仪器模块都证明了这一点。如果 @prefetch 方法通过捕获请求哪些键来包装模板渲染,我们可以(至少一级深度)捕获正在请求哪些键,返回模拟对象,以及做一批批对他们。这可以重复所有深度级别,直到没有新的密钥被请求。最终的渲染可以拦截获取并从地图返回对象。



透明地更改总共 200 进入 3 并且没有任何额外的代码。更不用说大大减少了对memcache的需求,并且在memcache无法使用的情况下提供帮助。



麻烦是我不知道该怎么做)。在我开始尝试之前,有没有其他人做过?还是有人想帮忙?或者你在计划中看到了一个巨大的缺陷?

解决方案

我一直处于类似的情况。而不是ReferenceProperty,我有父母/孩子的关系,但基础是相同的。我目前的解决方案没有被打磨,但至少对于具有200-1,000个实体的报告和事物来说是足够有效的,每个实体都有多个后续的子实体需要获取。



您可以

 #鉴于帖子,提取模板需要的所有数据
#只有2个来自数据存储区的按键加载。
posts = get_the_posts()

author_keys = [post.author.get_value_for_datastore(x)for x in posts]
authors = db.get(author_keys)

city_keys = [作者中x的作者city.get_value_for_datastore(x)]
cities = db.get(city_keys)

用于邮政,作者,拉链城市(帖子,作者,城市):
post.author = author
author.city = city



<现在,当您渲染模板时,不会执行其他查询或提取操作。它在边缘很粗糙,但是我不能没有我刚才描述的这种模式。



你也可以考虑验证你的实体都不是 None 因为如果密钥不好,db.get()将返回None。尽管如此,这只是基本的数据验证。同样,如果出现超时等情况,您需要重试db.get()。
$ b(最后,我认为memcache不会作为主要解决方案。也许作为第二层来加速数据存储调用,但是如果memcache为空,你需要工作得很好。另外,Memcache本身也有一些配额,比如memcache调用和总数据传输。过度使用memcache是​​杀死应用程序的好方法。 )

Here's my problem:

class City(Model):
  name = StringProperty()

class Author(Model):
  name = StringProperty()
  city = ReferenceProperty(City)

class Post(Model):
  author = ReferenceProperty(Author)
  content = StringProperty()

The code isn't important... its this django template:

{% for post in posts %}
<div>{{post.content}}</div>
<div>by {{post.author.name}} from {{post.author.city.name}}</div>
{% endfor %}

Now lets say I get the first 100 posts using Post.all().fetch(limit=100), and pass this list to the template - what happens?

It makes 200 more datastore gets - 100 to get each author, 100 to get each author's city.

This is perfectly understandable, actually, since the post only has a reference to the author, and the author only has a reference to the city. The __get__ accessor on the post.author and author.city objects transparently do a get and pull the data back (See this question).

Some ways around this are

  1. Use Post.author.get_value_for_datastore(post) to collect the author keys (see the link above), and then do a batch get to get them all - the trouble here is that we need to re-construct a template data object... something which needs extra code and maintenance for each model and handler.
  2. Write an accessor, say cached_author, that checks memcache for the author first and returns that - the problem here is that post.cached_author is going to be called 100 times, which could probably mean 100 memcache calls.
  3. Hold a static key to object map (and refresh it maybe once in five minutes) if the data doesn't have to be very up to date. The cached_author accessor can then just refer to this map.

All these ideas need extra code and maintenance, and they're not very transparent. What if we could do

@prefetch
def render_template(path, data)    
  template.render(path, data)

Turns out we can... hooks and Guido's instrumentation module both prove it. If the @prefetch method wraps a template render by capturing which keys are requested we can (atleast to one level of depth) capture which keys are being requested, return mock objects, and do a batch get on them. This could be repeated for all depth levels, till no new keys are being requested. The final render could intercept the gets and return the objects from a map.

This would change a total of 200 gets into 3, transparently and without any extra code. Not to mention greatly cut down the need for memcache and help in situations where memcache can't be used.

Trouble is I don't know how to do it (yet). Before I start trying, has anyone else done this? Or does anyone want to help? Or do you see a massive flaw in the plan?

解决方案

I have been in a similar situation. Instead of ReferenceProperty, I had parent/child relationships but the basics are the same. My current solution is not polished but at least it is efficient enough for reports and things with 200-1,000 entities, each with several subsequent child entities that require fetching.

You can manually search for data in batches and set it if you want.

# Given the posts, fetches all the data the template will need
# with just 2 key-only loads from the datastore.
posts = get_the_posts()

author_keys = [Post.author.get_value_for_datastore(x) for x in posts]
authors = db.get(author_keys)

city_keys = [Author.city.get_value_for_datastore(x) for x in authors]
cities = db.get(city_keys)

for post, author, city in zip(posts, authors, cities):
  post.author = author
  author.city = city

Now when you render the template, no additional queries or fetches will be done. It's rough around the edges but I could not live without this pattern I just described.

Also you might consider validating that none of your entities are None because db.get() will return None if the key is bad. That is getting into just basic data validation though. Similarly, you need to retry db.get() if there is a timeout, etc.

(Finally, I don't think memcache will work as a primary solution. Maybe as a secondary layer to speed up datastore calls, but you need to work well if memcache is empty. Also, Memcache has several quotas itself such as memcache calls and total data transferred. Overusing memcache is a great way to kill your app dead.)

这篇关于通过聚合RPC调用来加速GAE-Py中的模板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆