使用游标的Google应用引擎数据存储查询不会迭代所有项目 [英] Google app engine datastore query with cursor won't iterate all items

查看:239
本文介绍了使用游标的Google应用引擎数据存储查询不会迭代所有项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我有一个带有过滤器的数据存储查询,例如:

  datastore.NewQuery(sometype) .Filter(SomeField<,10)

我使用游标来迭代结果(例如在不同的任务中)。如果在迭代时更改 SomeField 的值,则光标将不再适用于Google应用引擎(在devappserver上正常运行)。



我在这里有一个测试项目:,以了解更多详细信息,因为您没有使用祖先查询,它可以保证强烈一致结果; time.Sleep()延迟正好增加了这个概率)。

最后处理的实体不满足过滤器,并且不会再次搜索所有实体,而是报告没有更多实体与过滤器匹配,因此不会再更新实体(并且不会报告错误)。



建议:不要使用游标并按照您要更新的属性进行过滤或排序同时。



顺便说一句:



您引用Appengine文档的部分:


对于使用不等式过滤器或具有多个值的属性的排序顺序的查询,游标并不总是按预期工作。


这不是你的想法。这意味着:游标可能无法在具有多个值的属性上正常工作并且相同的属性包含在不等式过滤器中用于按结果对结果进行排序。



顺便说一下#2



在使用SDK 1.9.17的截图中。最新的SDK版本是1.9.21。您应该更新它,并始终使用最新的可用版本。



实现目标的替代方案



1)Don 't使用游标



如果您有很多记录,您将无法一步更新所有实体(在一个循环中),但假设您更新300个实体。如果重复查询,已更新的实体将不会再次执行相同查询的结果,因为已更新的 Value = 2 不符合过滤器值及2 。只需重新查询+更新,直到查询没有结果。由于您的更改是幂等,因此如果更新索引条目实体被延迟并且会多次被查询返回。最好是延迟执行下一个查询,以尽量减少这种情况的发生(例如,在重做查询之间等待几秒钟)。

优点: 简单。您已经有了解决方案,只需排除游标处理部分。



缺点:有些实体可能会多次更新(因此更改必须是 idempotent )。此外,对实体执行的更改必须是将实体从下一个查询中排除的内容。

2)使用任务队列



您可以先执行一个按键查询并将更新推迟到使用任务。假设你可以创建任务,假设每个都传递100个密钥,任务可以通过密钥加载实体并进行更新。这将确保每个实体只会更新一次。由于涉及任务队列,此解决方案的延迟时间会稍长一些,但在大多数情况下这不是问题。



优点:否重复更新(因此更改可能不是幂等)。即使要执行的更改不会将该实体从下一个查询中排除(更普遍)。

缺点:更高的复杂性。更大的延迟/延迟。
$ b $ 3使用Map-Reduce

您可以使用map-reduce框架/实用程序可以对许多实体进行大规模并行处理。不知道它是否已在Go中实现。


$ b 优点:并行执行,可处理数百万或数十亿个实体。在大型实体编号的情况下速度要快得多。 Plus专业人员列在2)使用任务队列。



缺点:<复杂性更高。可能还没有在Go中可用。


In my application I have a datastore query with a filter, such as:

datastore.NewQuery("sometype").Filter("SomeField<", 10)

I'm using a cursor to iterate batches of the result (e.g in different tasks). If the value of SomeField is changed while iterating over it, the cursor will no longer work on google app engine (works fine on devappserver).

I have a test project here: https://github.com/fredr/appenginetest In my test I ran /db that will setup the db with 10 items with their values set to 0, then ran /run/2 that will iterate over all items where the value is less than 2, in batches of 5, and update the value of each item to 2.

The result on my local devappserver (all items are updated):

The result on appengine (only five items are updated):

Am I doing something wrong? Is this a bug? Or is this the expected result? In the documentation it states:

Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.

解决方案

The problem is the nature and implementation of the cursors. The cursor contains the key of the last processed entity (encoded), and so if you set a cursor to your query before executing it, the Datastore will jump to the entity specified by the key encoded in the cursor, and will start listing entities from that point.

Let's examine your case

Your query filter is Value<2. You iterate over the entities of the query result, and you change (and save) the Value property to 2. Note that Value=2 does not satisfy the filter Value<2.

In the next iteration (next batch) a cursor is present which you apply properly. Therefore when the Datastore executes the query, it jumps to the last entity processed in the previous iteration, and wants to list entities that come after this. But the entity pointed by the cursor may already not satisfy the filter; because the index entry for its new Value 2 will most likely be already updated (non-deterministic behavior - see eventual consistency for more details which applies here because you did not use an Ancestor query which would guarantee strongly consistent results; the time.Sleep() delay just increases the probability of this).

So the Datastore sees that the last processed entity does not satisfy the filter and will not search all the entities again but report that no more entities are matching the filter, hence no more entities will be updated (and no errors wil be reported).

Suggestion: don't use cursors and filter or sort by the same property you're updating at the same time.

By the way:

The part from the Appengine docs you quoted:

Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.

This is not what you think. This means: cursors may not work properly on a property which has multiple values AND the same property is either included in an inequality filter or is used to sort the results by.

By the way #2

In the screenshot you are using SDK 1.9.17. The latest SDK version is 1.9.21. You should update it and always use the latest available version.

Alternatives to achieve your goal

1) Don't use cursors

If you have many records, you won't be able to update all your entities in one step (in one loop), but let's say you update 300 entities. If you repeat the query, the already updated entities will not be in the results of executing the same query again because the updated Value=2 does not satisfy the filter Value<2. Just redo the query+update until the query has no results. Since your change is idempotent, it would not cause any harm if the update of the index entry of an entity is delayed and would get returned by the query multiple times. It would be best to delay the execution of the next query to minimize the chance of this (e.g. wait a few seconds between redoing the query).

Pros: Simple. You already have the solution, just exclude the cursor handling part.

Cons: Some entities might get updated multiple times (therefore the change must be idempotent). Also the change performed on entities must be something which will exclude the entity from the next query.

2) Using Task Queue

You could first execute a keys-only query and defer the update to using tasks. You could create tasks with let's say passing 100 keys to each, and the tasks could load the entities by key and do the update. This would ensure each entity would only get updated once. This solution would have a little bigger delay due to involving the task queue, but that is not a problem in most cases.

Pros: No duplicated updates (therefore change may be non-idempotent). Works even if the change to be performed would not exclude the entity from the next query (more general).

Cons: Higher complexity. Bigger lag/delay.

3) Using Map-Reduce

You could use the map-reduce framework/utility to do massively parallel processing of many entities. Not sure if it has been implemented in Go.

Pros: Parallel execution, can handle even millions or billions of entities. Much faster in case of large entity number. Plus pros listed at 2) Using Task Queue.

Cons: Higher complexity. Might not be available in Go yet.

这篇关于使用游标的Google应用引擎数据存储查询不会迭代所有项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆