如何并行更新400,000个GAE数据存储实体? [英] How to update 400,000 GAE datastore entities in parallel?

查看:107
本文介绍了如何并行更新400,000个GAE数据存储实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有400,000个特定类型的实体,我想对它们中的每一个执行一个简单的操作(添加一个属性)。我无法连续处理它们,因为这会花费很长时间。我不想使用MapReduce库,因为它很复杂并且很难。

基本上我想在taskqueue上创建100个任务,每个任务占用一个〜4000个实体的一部分,并在每个任务上执行这个操作。希望这不会超过几分钟来处理所有400k实体,当所有任务并行执行时。



然而,我不确定如何使用GAE查询来做到这一点。我的实体具有由我的应用程序生成的230498234-com.example形式的字符串ID。我希望每个任务基本上都要问数据存储,例如请给我实体#200,000-#204,000,然后逐个对它们进行操作。



这可能吗?我怎样才能以这种方式划分数据存储区?

解决方案

这是MapReduce的完美工作( https://developers.google.com/appengine/docs/python/dataprocessing/ )。起初学习可能很难,但一旦掌握,你就会爱上它。



您也可以考虑在下一次保存条目时懒惰地添加属性,前提是不要将该属性与查询中的默认值相同。


I have 400,000 entities of a certain type, and I'd like to perform a simple operation on each of them (adding a property). I can't process them serially because it would take forever. I don't want to use the MapReduce library because it is complicated and overwhelming.

Basically I'd like to create 100 tasks on the taskqueue, each task taking a segment of ~4,000 entities and performing this operation on each one. Hopefully this wouldn't take more than a few minutes to process all 400k entities, when all tasks are executing in parallel.

However, I'm not sure how to use GAE queries to do this. My entities have string ID's of the form "230498234-com.example" which were generated by my application. I want each task to basically ask the datastore something like, "Please give me entities #200,000-#204,000" and then operate on them one by one.

Is this possible? How can I divide up the datastore in this way?

解决方案

This is a perfect job for MapReduce (https://developers.google.com/appengine/docs/python/dataprocessing/). It may be difficult to learn at first but once mastered you'll fall in love with it.

You can also consider lazily adding the property when the entry is next saved, provided not having the property is the same as having the default value in your query.

这篇关于如何并行更新400,000个GAE数据存储实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆