如何在GAE的高复制数据存储中检索最新版本的记录? [英] How to retrieve the latest version of a record in GAE's high replication datastore?

查看:154
本文介绍了如何在GAE的高复制数据存储中检索最新版本的记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个REST服务,用于将iPhone中的数据同步到我们的GAE。
在少数情况下,我们会在同一天获得双重入场券。我相信我在设计 Record 类时犯了一个错误,并且希望在我尝试任何数据迁移之前仔细检查我的假设和可能的解决方案是否正确。



首先,我会检查所有传入的json_records,如果它发现count == 1,则表示存在需要更新的现有条目(这是有时错误!!!)。然后它检查时间戳,并且只会在传入的时间戳更大时更新它,否则它将忽略它。

  for json_record in json_records :
recordsdb = Record.query(Record.user == user.key,Record.record_date == date_parser.parse(json_record ['record_date']))
如果recordsdb.count()== 1 :
rec = recordsdb.fetch(1)[0]
如果rec.timestamp< json_record ['timestamp']:
....
rec.put()

elif recordsdb.count()== 0:
new_record = Record( user = user.key,
record_date = date_parser.parse(json_record ['record_date']),
notes = json_record ['notes'],
timestamp = json_record ['timestamp'])
new_record.put()

如果我没有错,这种查询对象的方法,没有提供保证,它是最新版本。

  recordsdb = Record.query(Record.user == user.key, Record.record_date == date_parser.parse(json_record ['record_date']))

我相信唯一的way GAE / Highreplication Datastore可以确保你有最新的数据在你面前,如果你通过一个键检索它。



因此,如果这个假设这是正确的,我应该保存我的记录与日期字符串作为第一位的关键。

  jsondate = date_parser.parse (json_record ['record_date'] 
new_record = Record(id = jsondate.strftime(%Y-%m-%d)
user = user.key,
record_date = jsondate) ,
notes = json_record ['notes'],
timestamp = json_record ['timestamp'])
new_record.put()

当我必须查询记录是否已经存在时,我会通过它的键来获取它:

  jsondate = date_parser.parse(json_record ['record_date'] 
record = ndb.Key('Record',jsondate.strftime(%Y-%m- %d))。get()

现在如果record为null,那么我必须创建一个新的记录。
如果记录!= null,那么我必须更新它。



我的假设和解决方案是否正确?
如何使用date-string作为关键字来迁移这些数据?

更新

我刚刚意识到我犯了另一个错误。我无法将记录设置为日期字符串。因为每个用户都可以有一天的记录,这会导致密钥的重复。



我相信解决这个问题的唯一方法是通过祖先/ parent ,我仍然试图解决它。



更新2:



尝试查看我是否理解Patrick的解决方案。如果没有意义,或者有更好的办法,请纠正我。



我会添加一个 is_fixed

 类Record(ndb.Model)
user = ndb.KeyProperty(kind = User)
is_fixed = ndb.BooleanProperty()
...

然后我会通过游标查询现有记录并在之后删除它们:

  q = Record.query()
q_forward = q.order(Record.key)
cursor = None

while True:
records,cursor,more = q_forward.fetch_page(100)
if不记录:
break;
记录在记录中:
new_record = Record(parent = user.key,...)
new_record.is_fixed = True
new_record.put()

//现在删除旧的,我想知道这是否会成为一个问题:
在Record.query()中是旧的
如果不是old.is_fixed:
old.delete )


解决方案

建议让用户成为用户的祖先。

正如您所提到的,您遇到的问题是最终一致性的结果 - 您的查询不保证具有最新的结果。通过祖先查询,结果将强烈一致



需要注意的一件重要事情是,在一个实体组(单个祖先)内,每秒只能更新一次。由于每个用户只有一条记录,这看起来应该不是问题。



您的代码实际上已经全部设置给用户祖先:

  new_record = Record(parent = user.key,#这里我们说记录的祖先是用户
record_date = date_parser .parse(json_record ['record_date']),
notes = json_record ['notes'],
timestamp = json_record ['timestamp'])
pre>

然后现在您可以使用强大一致的查询:

  Record.query(ancestor == user.key,Record.record_date == date_parser.parse(json_record ['record_date']))

但是,您将在更改现有记录的ID时遇到同样的问题。将祖先添加到实体有效地改变了将祖先作为前缀的关键。为了做到这一点,你必须浏览所有的记录,并创建新的用户作为祖先。您可以使用查询来批量获取结果(使用游标),或者如果您有大量数据,可能值得探索 MapReduce库


I have created a REST service for syncing data from iPhones to our GAE. In a few situations we get double entries for the same day. I believe I have made a mistake in the design of the Record class and would like to double check if my assumption and possible solution is correct before I attempt any data migration.

First I go through all incoming json_records, if it finds count == 1, then that means there is an existing entry that needs to be updated (This is where it goes sometimes wrong!!!). Then it checks the timestamp and would only update it if the incoming timestamp is greater, otherwise it ignores it.

for json_record in json_records:
    recordsdb = Record.query(Record.user == user.key, Record.record_date == date_parser.parse(json_record['record_date']))
         if recordsdb.count() == 1:
             rec = recordsdb.fetch(1)[0]
             if rec.timestamp < json_record['timestamp']:
                  ....
                  rec.put()

         elif recordsdb.count() == 0:
             new_record = Record(user=user.key, 
                                    record_date = date_parser.parse(json_record['record_date']), 
                                    notes = json_record['notes'], 
                                    timestamp = json_record['timestamp'])
             new_record.put()

If I am not wrong, this way of querying an object, provides no gurantee that it is the latest version.

recordsdb = Record.query(Record.user == user.key, Record.record_date == date_parser.parse(json_record['record_date']))

I believe the only way GAE/Highreplication Datastore can make sure that you have the latest data in front of you is if you retrieve it by a key.

Hence, if this assumption is correct, I should have saved my records with a date string as the key in first place.

jsondate = date_parser.parse(json_record['record_date']
new_record = Record(id = jsondate.strftime("%Y-%m-%d")
                    user=user.key, 
                    record_date = jsondate), 
                    notes = json_record['notes'], 
                    timestamp = json_record['timestamp'])
new_record.put()

and when I have to query to see if the record already exists, I would get it by its key like this:

jsondate = date_parser.parse(json_record['record_date']
record = ndb.Key('Record', jsondate.strftime("%Y-%m-%d")).get()

Now if record is null then I have to create a new record. if record != null then I have to update it.

Is my assumption and solution correct? How can I migrate this data with date-string as their key?

UPDATE

I just realised another mistake I made. I can't set the record to its date-string. Because each user can have a record for a day, which causes duplication for the key.

I believe the only way to solve that is through ancestor/parent, which I am still trying to get my head around it.

UPDATE 2:

Trying to see if I understand Patrick's solution here. If it doesn't make sense, or there is a better way, please correct me.

I would add a is_fixed flag to the existing model:

class Record(ndb.Model)
    user = ndb.KeyProperty(kind=User)
    is_fixed = ndb.BooleanProperty()
    ...

Then I would query for the existing records via a cursor and delete them afterwards:

q = Record.query()
q_forward = q.order(Record.key)
cursor = None

while True:
   records, cursor, more = q_forward.fetch_page(100)
   if not records:
      break;
   for record in records:
       new_record = Record(parent=user.key, ... )
       new_record.is_fixed = True
       new_record.put()

//now delete the old ones, I wonder if this would be an issue:
for old in Record.query()
   if not old.is_fixed:
      old.delete()  

解决方案

Since your query is always per user, I would recommend having the User be a ancestor of the user.

As you mentioned, the issue that you are hitting is a result of eventual consistency -- your query is not guaranteed to have the most up to date results. With an ancestor query, the results will be strongly consistent.

One important piece to watch out for is that within an entity group (a single ancestor), you are limited to 1 update per second. Since you only have one record per user, this seems like it shouldn't be a problem.

Your code is actually already all setup to user ancestors:

new_record = Record(parent=user.key, # Here we say that the ancestor of the record is the user
                    record_date =date_parser.parse(json_record['record_date']), 
                    notes = json_record['notes'], 
                    timestamp = json_record['timestamp'])

And then now you can actually use a strongly consistent query:

Record.query(ancestor == user.key, Record.record_date == date_parser.parse(json_record['record_date']))

However, you are going to have the same problems with changing the id of existing Records. Adding an ancestor to an entity is effectively changing it's key to have the ancestor as a prefix. In order to do this, you'll have to go through all your records and create new ones with their user as an ancestor. You can probably either do this using a query to grab results in batches (using cursors to step forward) or if you have a lot of data it may be worthwhile to explore the MapReduce library.

这篇关于如何在GAE的高复制数据存储中检索最新版本的记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆