Google App Engine中的争用问题 [英] Contention problems in Google App Engine

查看:165
本文介绍了Google App Engine中的争用问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个请求处理程序,注释为:

  @ ndb.transactional(xg = True,retries = 5)

..在该代码中,我获取了一些东西,更新了其他一些东西等等。但是有时候在请求期间会出现这样的错误:

  16:06:20.930暂停生成器_get_tasklet(context.py:329)引发TransactionFailedError(对这些数据存储实体太多争用,请再试一次实体组键: app:s〜my-appname
path<
元素{
类型:PlayerGameStates
名称:hannes2
}
> ;

16:06:20.930暂停生成器获取(context.py:744)引发TransactionFailedError(对这些数据存储实体的争用太多,请再试一次实体组键:app:s〜my -appname
路径<
元素{
类型:PlayerGameStates
名称:hannes2
}
>

16:06:20.930暂停生成器get(context.py:744)引发TransactionFailedError(对这些数据存储实体的争用太多,请再试一次entity group key:app:s〜my- appname
path<
元素{
类型:PlayerGameStates
名称:hannes2
}
>

16:06:20.936暂停的生成器事务(context.py:1004)引发了TransactionFailedError(对这些数据存储实体的争用太多了,请再试一次实体组键:app:s〜my-appname
路径<
元素{
类型:PlayerGameStates
名称:hannes2
}
>

..后面是堆栈跟踪。如果需要,我可以使用整个堆栈跟踪进行更新,但这种情况很长。



我不明白为什么会发生这种情况。看看我的代码中有一个例外,我在一个完全不同的实体(Round)上运行 get_by_id 。在错误消息中提到的名为hannes2的PlayerGameStates是另一个实体GameState的父代,它已经从数据库提前几行的 get_async :ed ;

 #GameState被get_async读取
gamestate_future = GameState.get_by_id_async(id,ndb.Key('PlayerGameStates' ,player_key))
...
gamestate = gamestate_future.get_result()
...

怪异(?)的事情是,没有对该实体发生的数据存储的写入。我的理解是,如果在同一时间同时更新同一个实体,争用错误可能会发生。或者,如果写入过多,则会在短时间内发生。

但是在阅读实体的时候会发生吗? (暂停发电机获得..??)并且,这是否发生在5 ndb.transaction重试后?我无法看到日志中的任何内容,表明有任何重试。



任何帮助都非常感谢。



交易开始后 - 在您的情况下用 @ ndb.transactional()注释的处理程序被调用 - 任何被访问的实体组(通过读或写操作,无关紧要)都会立即被标记为这样。在那一刻,不知道在交易结束时是否会有写操作 - 这根本不重要。



太多的争用错误(它不同于冲突错误!)表示太多并行事务同时尝试访问同一个实体组。即使没有任何交易实际尝试写入,也可能发生这种情况!



注意:此争用是 NOT 模拟通过开发服务器,只有在部署在GAE上时才能看到真正的数据存储区!

可能会增加混淆的是自动重新尝试事务,这可能发生在实际写入冲突或只是简单访问争用之后。这些重试对最终用户来说可能是重复执行某些代码路径 - 您的情况下的处理程序。



重试实际上可能会使情况更糟(短时间) - 在已经被大量访问的实体组中抛出更多的访问 - 我已经看到了只有在指数退避延迟变得足够大以让事情变得冷静一点(如果重试次数足够大)的情况下才有这种事务的模式,正在进行的交易已完成。



我的方法是移动推送队列任务上的大部分事务处理,禁用事务和任务级别的重试, - 完全排列任务 - 更少的重试次数,但间隔更远。



通常当遇到这样的问题时,您必须重新访问您的数据结构和/或您的方式重新访问它们(您的交易)。除了保持强一致性(可能相当昂贵)的解决方案之外,您可能还想重新检查一致性是否是必须的。在某些情况下,它只是因为看起来简化了一些东西而被作为一个整体要求添加的。根据我的经验,它不会:)

另一件事可以帮助(但只有一点)使用更快(也更昂贵)的实例类型 - 更短的执行时间转化为交易重叠的风险略低。我注意到这一点,因为我需要一个具有更多内存的实例,碰巧也更快:)

I'm having contention problems in Google App Engine, and try to understand what's going on.

I have a request handler annotated with:

@ndb.transactional(xg=True, retries=5) 

..and in that code I fetch some stuff, update some others etc. But sometimes an error like this one comes in the log during a request:

16:06:20.930 suspended generator _get_tasklet(context.py:329) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
path <
  Element {
    type: "PlayerGameStates"
    name: "hannes2"
  }
>
)
16:06:20.930 suspended generator get(context.py:744) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )
16:06:20.930 suspended generator get(context.py:744) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )
16:06:20.936 suspended generator transaction(context.py:1004) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )

..followed by a stack trace. I can update with the whole stack trace if needed, but it's kind of long.

I don't understand why this happens. Looking at the line in my code there the exception comes, I run get_by_id on a totally different entity (Round). The "PlayerGameStates", name "hannes2" that is mentioned in the error messages is the parent of another entity GameState, which have been get_async:ed from the database a few lines earlier;

# GameState is read by get_async
gamestate_future = GameState.get_by_id_async(id, ndb.Key('PlayerGameStates', player_key))
...
gamestate = gamestate_future.get_result()
...

Weird(?) thing is, there are no writes to the datastore occurring for that entity. My understanding is that contention errors can come if the same entity is updated at the same time, in parallell.. Or maybe if too many writes occur, in a short period of time..

But can it happen when reading entities also? ("suspended generator get.."??) And, is this happening after the 5 ndb.transaction retries..? I can't see anything in the log that indicates that any retries have been made.

Any help is greatly appreciated.

解决方案

Yes, contention can happen for both read and write ops.

After a transaction starts - in your case when the handler annotated with @ndb.transactional() is invoked - any entity group accessed (by read or write ops, doesn't matter) is immediately marked as such. At that moment it is not known if by the end of transaction there will a write op or not - it doesn't even matter.

The too much contention error (which is different than a conflict error!) indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!

Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!

What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - the handler in your case.

Retries can actually make matter worse (for a brief time) - throwing even more accesses at the already heavily accessed entity groups - I've seen such patterns with transactions only working after the exponential backoff delays grow big enough to let things cool a bit (if the retries number is large enough) by allowing the transactions already in progress to complete.

My approach to this was to move most of the transactional stuff on push queue tasks, disable retries at the transaction and task level and instead re-queue the task entirely - fewer retries but spaced further apart.

Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)

Another thing can can help (but only a bit) is using a faster (also more expensive) instance type - shorter execution times translate into a slightly lower risk of transactions overlapping. I noticed this as I needed an instance with more memory, which happened to also be faster :)

这篇关于Google App Engine中的争用问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆