我应该围绕优化读取或谷歌应用程序引擎的CPU时间 [英] Should I optimize around reads or CPU time in Google App Engine

查看:121
本文介绍了我应该围绕优化读取或谷歌应用程序引擎的CPU时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图优化我的设计,但它确实很难把事情的角度。说我有以下情况:

I'm trying to optimize my design, but it's really difficult to put things in perspective. Say I have the following cases:

一个。用户有1000状态更新。这些更新被存储在一个单独的实体,状态。我想获得一个用户的状态具有十日期后uploadDate所以我做了查询:

A. A User has 1,000 status updates. These updates are stored in a separate entity, Statuses. I want to get a User's statuses which have a uploadDate after date X. So I do a query:

状态= Statuses.query(Statuses.uploadDate> X).fetch()

乙。用户有1000状态更新。每个用户实体有一个列表属性 list_of_status_keys ,这是所有关键用户的状态的列表。我想用uploadDate所有状态的日期X.之后所以我很容易地得到使用状态列表状态= ndb.get_multi(list_of_status_keys)。然后,我遍历各一台,检查日期:

B. A User has 1,000 status updates. Each User entity has a list property list_of_status_keys, which is a list of all keys to the user's statuses. I want to get all statuses with uploadDate after date X. So I easily get a list of statuses using statuses = ndb.get_multi(list_of_status_keys). Then I loop through each one, checking the date:

for a_status in statuses:
  if a_status.uploadDate > X:
     myList.append(a_status)

我真的不知道,我应该优化。查询更加有组织看起来,而是通过按键获取更快。任何人有任何见解?

I really don't know which I should be optimizing for. A query is more organized it seems, but fetching by keys is quicker. Anyone have any insight?

下面就是它归结为: 在每一个HTTP请求到GAE上,我得到的所有通知和状态更新为用户(就像Facebook的)。使用Appstats,它告诉我,每个请求花费490 micropennies(其中1便士= 1,000,000 micropennies)。

Here's what it comes down to: In each http request to GAE, I get all notifications and status updates for a user (just like facebook). Using Appstats, it tells me that each request costs 490 micropennies (where 1 penny = 1,000,000 micropennies).

获取通知和状态对用户是重要的,所以你可以指望他们做了很多次。什么我有一个困难时期是确定如果这是一个很大与否。我吓坏了试图尽量减少任何可能的方式这个数字。我以前从来没有运行服务,所以我不知道这是应该花多少钱。这里的数学:

Getting notifications and statuses is important for a user, so you can expect them to do this many times. What I'm having a hard time with is determining if this is a lot or not. I'm freaking out trying to minimize this number in any way possible. I've never run a service before, so I don't know if this is how much it should cost. Here's the math:

每个请求费490 micropennies时不返回任何结果(因此只为它花费490一个基本的查询,但在某些情况下,当几个返回结果,它可能花费万MP),因此对于1分钱,我可以跑2040请求或$ 1美元,我可以运行204,000请求。

Each request costs 490 micropennies when no results are returned (so just for a basic query it costs 490, but on some cases when several results are returned, it could cost 10,000 mp), so for 1 penny, I can run 2040 requests, or for $1 dollar, I can run 204,000 requests.

让我们说我有50,000个用户,每个用户都检查通知的天(合理的)75倍:

Let's say I have 50,000 users, and each user checks for notifications 75 times a day (reasonable):

75的请求X 490 MP 50,000个用户=每天1837500000 micropennies = 1837.5便士=每天18.37美元。(是吗?)

我以前从来没有运行大规模的服务,所以这些是通常的成本是多少?还是这太高了?每请求高490 micropennies?我将如何找到一个答案,如果这取决于?

I've never run a large scale service before, so are these usual costs? Or is this too high? Is 490 micropennies per request high? How would I find an answer to this if it depends?

推荐答案

设计一个优越。

在设计中的GAE将使用日期来执行键控查询。这句话的意思是,AppEngine上会自动在状态表排序日期创建索引给你。因为它有一个索引,它会读取并只获取您指定的日期之后的记录。这将节省您大量的读取。

In design A GAE will use the date to perform a keyed query. What this means is, that Appengine will automatically create an index for you on the Status table sorted by the date. Since it has an index, it will read and fetch only the records after the date you specify. This will save you a large number of reads.

在设计B你基本上将不得不做索引工作的自己。因为你需要提取每个状态,然后比较它的日期,你将不得不做更多的工作,无论是在CPU方面(的费用)在性能方面。

In Design B you basically will have to do the indexing work yourself. Since you will need to fetch each Status and then compare its date you will have to do more work, both in terms of CPU (is cost) as in terms of performance.

修改

如果您的数据那样频繁此访问,你可能有其他的设计方案也是如此。

If your data is accessed as frequently as this, you may have other design options as well.

首先,你可以考虑的对象的状态组合成StatusUpdatesPerDay。每一天,你创建一个实例,然后附加状态更新到该对象。这将减少数百读成一对​​夫妇的读取。

First you could consider combining the Status objects into StatusUpdatesPerDay. For each day you create a single instance and then append status updates to that object. This will reduce hundreds of reads into a couple of reads.

其次,由于状态更新会很频繁的访问,您可以缓存在内存缓存的状态。这会给降低成本和延迟。

Second, since the status updates will be accessed very frequently, you can cache the Status in memcache. This will give reduce costs and latency.

第三,即使你不优化上面一样,我相信NDB已建成的高速缓存。我从来没有使用这个功能,但实际的读数可能会比你的计算较低。

Third, even if you do not optimize as above, I believe ndb has built in caching. I have never used this feature, but your actual read counts may be lower than in your calculations.

第四个办法是避免一次显示所有的状态更新。也许用户想要看到的只是最后几个。然后你可以使用查询游标时(如果)在用户请求时得到的余数。

A fourth option is avoid displaying all status updates at once. Maybe the user wants to see only the last few. Then you can use query cursors to get the remainder when (and if) the user requests them.

这篇关于我应该围绕优化读取或谷歌应用程序引擎的CPU时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆