使用appengine数据存储区祖先路径进行高效搜索 [英] efficient searching using appengine datastore ancestor paths

查看:125
本文介绍了使用appengine数据存储区祖先路径进行高效搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个覆盖租约,我需要根据索引属性范围和客户端ID在一个巨大的appengine数据存储区中进行搜索和获取。使用祖先路径是否有效?或者,可以使用额外的过滤器来完成相同的操作。例如,

通过客观化获得前100名工资

  Key< Clients> clientIdKey = Key.create(Clients.class,500)
ofy()。load().type(Salaries.class).ancestor(clientIdKey).order( - salary)。limit(100).list ()

或者只是

  ofy()。load()。type(Salaries.class).filter(clientId =,500).order( -  salary)。limit(100).list()

我的假设是,在第一种情况下,属于任何其他客户端的所有实体都将被忽略,但在后面的情况下,将全面扫描,这将更加昂贵。这个假设是正确的吗?

还有全局存储的索引salary,还是根据祖先进行分区,以便索引更新只发生在同一个祖先内?这将减少更新索引所需的时间,并且在我们永远不会跨越不同客户端进行查询时,这将是一个很好的解决方案。 解决方案

我需要指出的第一件事是数据存储不会执行表扫描。有几个例外(特别是之字形合并),GAE查询只跟随索引 - 所以通常这些问题归结为哪个索引更有效维护?。

 <$ c $我们先谈谈第二个案例(请注意,我已经单数化了Salary,我认为这是您的意图) c> ofy()。load()。type(Salary.class).filter(clientId =,500).order( -  salary)。limit(100).list()

这需要在薪水{clientId,salary} DESC 。 GAE会将索引导航到 Salary / clientId / 500 开头,然后逐个读取每个索引记录。它会在任意数据中心的索引表上执行此操作,并且由于这些索引表是异步复制的,所以您会得到最终一致的结果。



为了让实体参与多属性索引,每个单独的属性必须自己索引。如果Salary没有其他索引属性,那么编写一个Salary将花费:
$ b $ ul
<1>为实体操作写入

  • 2写入 clientId 索引(asc和desc)

  • 2写入工资 index(asc和desc)

  • 1为多属性索引编写 {clientId,salary} DESC






  • 现在我们来看第一种情况:

      ofy()。load()。type(Salary.class).ancestor(clientIdKey).order( -  salary)。limit(100)。 list()

    这需要datastore-indexes.xml中的不同多属性索引。这次您需要一个薪水{祖先,薪水} DESC 的索引。此外,GAE的默认行为是从法定数据中心读取,以使其成为强大的一致操作。这应该比其他方法稍微慢一些(尽管不会更昂贵),但是,您可以明确指定最终一致性以获得相同的任何数据中心行为: ofy()。consistency(Consistency.EVENTUAL) .load()... 这里的好处是你可以选择强一致性。



    祖先方法的另一个好处是你不需要在 clientId 上维护一个单一的财产指数。以下是您编写此薪资时会发生的情况(假设没有其他索引字段):


    • 1为实体操作写入

    • 2写入薪水索引(asc和desc)

    • 1写入多属性索引 {ancestor,salary} DESC



    这可以使您的系统更便宜。多属性索引的最大成本通常是所有(不相关的)双向单属性索引的成本,您必须简单地将其作为GAE的标志来维护。




    关于你最后的问题,它可能有助于解释GAE索引表的样子。有三个用于索引的BigTable表,在所有应用程序中共享。前两个是单属性索引表(一个用于升序,一个用于降序)。他们的内容看起来很像这样:

      {appId} / {entityKind} / {propertyName} / {propertyValue} / { entityKey} 

    通过执行范围扫描(BigTable的基本操作之一),您可以确定哪些实体匹配您的查询。这也是为什么按键查询速度快/便宜的原因。您可以立即返回键,而无需进行后续查找。



    多属性索引表(看起来不像这样):

      {appId} / {entityKind} / {prop1name} / {prop1value} / {prop2name} / {prop2value} /.../ {entityKey} \ 

    使用<$ c的多属性索引的某些值进行可视化可能更容易$ c>薪水{clientId,salary} DESC :

    pre $ yourapp / Salary / clientId / 500 /薪水/ 99000 / aghzfnZvb3N0MHILCxIFRXZlbnQYAQw
    yourapp /薪水/ clientId / 500 / salary / 98000 / aghttydiisgAJJ3JGS0ij44JFAjasdw

    再次,您可以通过执行范围扫描来了解如何执行范围扫描,GAE可以找到与您的查询匹配的实体。



    我希望这有助于清除问题。 b $ b

    We have a mulch-tenancy and I need to search and fetch in a huge appengine datastore based on an indexed attribute range and client id. Does usage of Ancestor Paths make it efficient ? Alternatively Same can be done using an additional filter

    e.g. to get the top 100 salaries via objectify

    Key<Clients> clientIdKey = Key.create(Clients.class, 500)
    ofy().load().type(Salaries.class).ancestor(clientIdKey).order("-salary").limit(100).list()
    

    Alternatively just

    ofy().load().type(Salaries.class).filter("clientId = ", 500 ).order("-salary").limit(100).list()
    

    My assumption is that in the first case all entities belonging to any other Client will be ignored but in later case it will be full scan which will more expensive. Is this assumption right ?

    Also does the index "salary" stored globally or it is partitioned according to the ancestor so that the index update happens only within same ancestor ? This will reduce time taken update the index and will be a good solution when we are never going to query across different clients.

    解决方案

    The first thing I need to point out is that the datastore does not do table scans. With a couple exceptions (most notably zig-zag merges), GAE queries only follow indexes - so usually these kinds of questions boil down to "which index is more efficient to maintain?"

    Let's start by talking about the second case (note that I've singularized Salary, which I assume is your intention):

    ofy().load().type(Salary.class).filter("clientId = ", 500 ).order("-salary").limit(100).list()
    

    This requires a multi-property index on Salary { clientId, salary } DESC. GAE will navigate the index to the start of Salary/clientId/500 and then read off each index record one at a time. It will do this on the index table in an arbitrary datacenter - and since these index tables are replicated asynchronously, you get an eventually consistent result.

    In order for an entity to participate in a multi-property index, each of the individual single properties must be indexed themselves. If Salary had no other indexed properties, writing a single Salary would cost:

    • 1 write for the entity operation
    • 2 writes for the clientId index (asc and desc)
    • 2 writes for the salary index (asc and desc)
    • 1 write for the multi-property index { clientId, salary } DESC

    Now let's look at the first case:

    ofy().load().type(Salary.class).ancestor(clientIdKey).order("-salary").limit(100).list()
    

    This requires a different multi-property index in your datastore-indexes.xml. This time you need an index on Salary { ancestor, salary } DESC. In addition, the default behavior of GAE is to read from a quorum of datacenters to make this a strongly consistent operation. This should be somewhat slower (although no more expensive) than the other method, however, you can explicitly specify eventual consistency to get the same "any datacenter" behavior: ofy().consistency(Consistency.EVENTUAL).load()... The nice thing here is that you have the option of strong consistency.

    Another bonus of the ancestor approach is that you don't need to maintain a single-property index on clientId. Here's what happens when you write this Salary (assuming no other indexed fields):

    • 1 write for the entity operation
    • 2 writes for the salary index (asc and desc)
    • 1 write for the multi-property index { ancestor, salary } DESC

    This can make your system considerably cheaper. The biggest cost of multi-property indexes is often the cost of all the (otherwise irrelevant) bidirectional single-property indexes you must maintain simply as a flag to GAE.


    Regarding your last question, it might help to explain what GAE index tables look like. There are three BigTable tables for indexes, shared across all applications. The first two are single-property index tables (one for ascending, one for descending). Their contents look something very roughly like this:

    {appId}/{entityKind}/{propertyName}/{propertyValue}/{entityKey}
    

    By doing range scans (one of the primitive operations of BigTable), you can determine which entities match your query. This is also why keys-only queries are fast/cheap; you can return the key immediately without doing the subsequent lookup.

    The multi-property index table looks (again, this is not exact) like this:

    {appId}/{entityKind}/{prop1name}/{prop1value}/{prop2name}/{prop2value}/.../{entityKey}\
    

    It might be easier to visualize with some values for a multi-property index on Salary { clientId, salary } DESC:

    yourapp/Salary/clientId/500/salary/99000/aghzfnZvb3N0MHILCxIFRXZlbnQYAQw
    yourapp/Salary/clientId/500/salary/98000/aghttydiisgAJJ3JGS0ij44JFAjasdw
    

    Again, you can see how by performing range scans, GAE can find the entities which match your queries.

    I hope this helps clear things up.

    这篇关于使用appengine数据存储区祖先路径进行高效搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆