Google App Engine 中的争用问题 [英] Contention problems in Google App Engine

查看:33
本文介绍了Google App Engine 中的争用问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Google App Engine 中遇到了争用问题,并尝试了解发生了什么.

我有一个带有注释的请求处理程序:

@ndb.transactional(xg=True, retries=5)

..在那个代码中,我获取了一些东西,更新了一些东西等等.但有时在请求期间日志中会出现这样的错误:

16:06:20.930 暂停生成器 _get_tasklet(context.py:329) 引发 TransactionFailedError(对这些数据存储实体的争用过多.请重试.实体组密钥:app:s~my-appname"路径<元素 {类型:玩家游戏状态"名称:hannes2"}>)16:06:20.930 暂停生成器 get(context.py:744) 引发 TransactionFailedError(对这些数据存储实体的争用过多.请重试.实体组密钥:app:s~my-appname"路径<元素 {类型:玩家游戏状态"名称:hannes2"}>)16:06:20.930 暂停生成器 get(context.py:744) 引发 TransactionFailedError(对这些数据存储实体的争用过多.请重试.实体组密钥:app:s~my-appname"路径<元素 {类型:玩家游戏状态"名称:hannes2"}>)16:06:20.936 暂停生成器事务(context.py:1004)引发 TransactionFailedError(对这些数据存储实体的争用过多.请重试.实体组键:app:s~my-appname"路径<元素 {类型:玩家游戏状态"名称:hannes2"}>)

..接着是堆栈跟踪.如果需要,我可以更新整个堆栈跟踪,但它有点长.

我不明白为什么会发生这种情况.查看我的代码中出现异常的行,我在一个完全不同的实体(Round)上运行 get_by_id.错误消息中提到的PlayerGameStates",名称hannes2"是另一个实体 GameState 的父级,该实体已在几行之前从数据库中get_async:ed;

# GameState 被 get_async 读取gamestate_future = GameState.get_by_id_async(id, ndb.Key('PlayerGameStates', player_key))...gamestate = gamestate_future.get_result()...

奇怪(?)事情是,该实体没有写入数据存储.我的理解是,如果同一实体同时更新,可能会出现争用错误.

但是在读取实体时也会发生这种情况吗?(暂停生成器获取.."??)而且,这是在 ndb.transaction 重试 5 次之后发生的吗?我在日志中看不到任何表明已进行任何重试的内容.

非常感谢任何帮助.

解决方案

是的,读取和写入操作都可能发生争用.

事务开始后 - 在您的情况下,当使用 @ndb.transactional() 注释的处理程序被调用时 - 任何实体组访问(通过读或写操作,无关紧要)立即标记为这样.在那一刻,不知道在交易结束时是否会有写操作 - 这甚至无关紧要.

太多争用错误(与冲突错误不同!)表示有太多并行事务同时尝试访问同一个实体组.即使没有任何事务实际尝试写入,它也可能发生!

注意:此争用由开发服务器模拟,只有在部署在 GAE 上,使用真实数据存储时才能看到!

会增加混乱的是事务的自动重试,这可能发生在实际写入冲突或简单的访问争用之后.这些重试在最终用户看来可能是某些代码路径的可疑重复执行 - 在您的情况下是处理程序.

重试实际上会使事情变得更糟(在短时间内) - 在已经大量访问的实体组中抛出更多访问 - 我已经看到这种交易模式仅在指数退避延迟增长到足以让事情冷却之后才起作用通过允许已经在进行中的事务完成来稍微(如果重试次数足够大).

我的方法是将大部分事务性内容移到推送队列任务上,在事务和任务级别禁用重试,而是将任务完全重新排队 - 重试次数减少但间隔更远.

通常,当您遇到此类问题时,您必须重新访问您的数据结构和/或您访问它们的方式(您的交易).除了保持强一致性的解决方案(这可能非常昂贵)之外,您可能还想重新检查一致性是否真的是必须的.在某些情况下,它被添加为一揽子要求,只是因为似乎可以简化事情.根据我的经验,它不会:)

另一件事可以帮助(但只是一点点)使用更快(也更昂贵)的实例类型 - 更短的执行时间转化为稍微降低事务重叠的风险.我注意到了这一点,因为我需要一个具有更多内存的实例,而这恰好也更快:)

I'm having contention problems in Google App Engine, and try to understand what's going on.

I have a request handler annotated with:

@ndb.transactional(xg=True, retries=5) 

..and in that code I fetch some stuff, update some others etc. But sometimes an error like this one comes in the log during a request:

16:06:20.930 suspended generator _get_tasklet(context.py:329) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
path <
  Element {
    type: "PlayerGameStates"
    name: "hannes2"
  }
>
)
16:06:20.930 suspended generator get(context.py:744) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )
16:06:20.930 suspended generator get(context.py:744) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )
16:06:20.936 suspended generator transaction(context.py:1004) raised TransactionFailedError(too much contention on these datastore entities. please try again. entity group key: app: "s~my-appname"
  path <
    Element {
      type: "PlayerGameStates"
      name: "hannes2"
    }
  >
  )

..followed by a stack trace. I can update with the whole stack trace if needed, but it's kind of long.

I don't understand why this happens. Looking at the line in my code there the exception comes, I run get_by_id on a totally different entity (Round). The "PlayerGameStates", name "hannes2" that is mentioned in the error messages is the parent of another entity GameState, which have been get_async:ed from the database a few lines earlier;

# GameState is read by get_async
gamestate_future = GameState.get_by_id_async(id, ndb.Key('PlayerGameStates', player_key))
...
gamestate = gamestate_future.get_result()
...

Weird(?) thing is, there are no writes to the datastore occurring for that entity. My understanding is that contention errors can come if the same entity is updated at the same time, in parallell.. Or maybe if too many writes occur, in a short period of time..

But can it happen when reading entities also? ("suspended generator get.."??) And, is this happening after the 5 ndb.transaction retries..? I can't see anything in the log that indicates that any retries have been made.

Any help is greatly appreciated.

解决方案

Yes, contention can happen for both read and write ops.

After a transaction starts - in your case when the handler annotated with @ndb.transactional() is invoked - any entity group accessed (by read or write ops, doesn't matter) is immediately marked as such. At that moment it is not known if by the end of transaction there will a write op or not - it doesn't even matter.

The too much contention error (which is different than a conflict error!) indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!

Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!

What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - the handler in your case.

Retries can actually make matter worse (for a brief time) - throwing even more accesses at the already heavily accessed entity groups - I've seen such patterns with transactions only working after the exponential backoff delays grow big enough to let things cool a bit (if the retries number is large enough) by allowing the transactions already in progress to complete.

My approach to this was to move most of the transactional stuff on push queue tasks, disable retries at the transaction and task level and instead re-queue the task entirely - fewer retries but spaced further apart.

Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)

Another thing can can help (but only a bit) is using a faster (also more expensive) instance type - shorter execution times translate into a slightly lower risk of transactions overlapping. I noticed this as I needed an instance with more memory, which happened to also be faster :)

这篇关于Google App Engine 中的争用问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆