从数据存储区读取(跨组)实体时发生 TransactionFailedError(争用过多...) [英] TransactionFailedError (too much contention...) when reading (cross-group) entities from datastore

本文介绍了从数据存储区读取(跨组)实体时发生 TransactionFailedError(争用过多...)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在再次调查 TransactionFailedError 的意外发生(这些数据存储实体上的争用过多... 在这种情况下,代码仅读取被归咎于争用问题的实体组.

I’m investigating again the unexpected occurrence of TransactionFailedError (too much contention on these datastore entities... in cases, where the code only reads entity groups that are blamed for the contention problems.

GAE 标准环境,带有 NDB(SDK 1.9.51)的 Python 2.7.我设法在一个孤立的应用程序(只有我作为用户)中观察到错误,其中在任务队列中执行相同的请求处理程序,并且对下面提到的实体组的读/写访问仅由该处理程序完成.

GAE standard environment, Python 2.7 with NDB (SDK 1.9.51). I managed to observe the error in an isolated app (only me as user) where the same request handler is executed in a task queue and read/write access to the entity-groups mentioned below is only done by this handler.

处理程序每​​秒执行几次,基本上是一个迁移/复制任务,将现有的 OriginChild 实体从一个庞大的组中移到作为新 Target 的各个组中实体.每个 OriginChild 实体是一项任务.

The handler is executed a few times per second and basically is a migration / copy task to move existing OriginChild entities out of a huge group into individual groups as new Target entities. It is one task per OriginChild entity.

在跨组事务函数中,ndb.transaction(lambda: main_activity(), xg=True),每个请求处理程序...:

Inside a cross group transactional function, ndb.transaction(lambda: main_activity(), xg=True), each request handler...:

  • 确实使用 get_async NDB 小任务来检索两个实体:

  • does use get_async NDB tasklets to retrieve two entities:

  • Key(OriginGroup, 1)(所有请求都一样)

Key(OriginGroup, 1, OriginChild, Foo)(每个请求的唯一对象)

Key(OriginGroup, 1, OriginChild, Foo) (unique object per request)

执行Key(TargetConfig, 1).get()

Key(Target, Foo).get()(真的没有父级!)

如果 Key 不存在,则在离开事务函数之前使用 get_result() 执行 Key(Target, Foo).put_async() p>

if Key doesn't exist, does Key(Target, Foo).put_async() with get_result() before the transactional function is left

因此,这些是事务中的只读实体:

So, these are read-only entities in the transaction:

  • Key(Origin, 1)

Key(Origin, 1, OriginChild, Foo)

Key(TargetConfig, 1)

代码没有做任何更改,这些实体不会被删除或写回数据存储.此外,没有其他正在运行的请求尝试写入这些实体组 - 几个月来这些组中根本没有写入操作).

The code doesn't make any changes, these entities are not deleted or written back to the datastore. Moreover, there are no other requests running that try to write into these entity groups - no write ops at all in these groups for months).

放入数据存储区的唯一实体是 Key(Target, Foo),其中每个请求的 ID 都是唯一的.

The only entity that is put to the datastore is Key(Target, Foo) where the ID is unique per request.

大约 60-70% 的请求将在没有错误的情况下运行.

Approximately 60-70% of the requests will run with-out errors.

当发生 TransactionFailedError 时,它会在事务函数内部,日志显示如下:

When the TransactionFailedError occurs, it will be inside the transactional function, the logging shows something like this:

<代码>暂停生成器 get(context.py:758) 引发 TransactionFailedError(这些数据存储实体上的争用过多.请重试.实体组密钥:app:e~my-test-app"名称空间:测试"路径<元素 {类型:起源组"编号:1}>)

在大约 80% 的失败请求中,错误将与 Key(OriginGroup, 1) 相关(尽管整个组都以只读方式使用).

In ~80% of the failed requests, the error will relate to Key(OriginGroup, 1) (although the entire groups is used read-only).

在大约 10% 的失败请求中,错误将显示 Key(TargetConfig, 1)(也是只读的).

In ~10% of the failed requests the error will show Key(TargetConfig, 1) (read-only, too).

在剩余的约 10% 中,它将归咎于新实体,例如Key(Target, Foo),或者对于任何 TargetChild 的 ID 请求执行迁移,它似乎只在 put() 期间发生,而不是在 get() 之前尝试过.

In the remaining ~10% it will blame the new entity, e.g. Key(Target, Foo), or for whatever TargetChild's ID the request performs the migration and it seems it happens only during the put(), not the get() attempt before.

我对事务和实体组的理解是 NDB 遵循乐观并发控制,因此来自同一实体组的大量读取操作是可能的(因此可扩展性),并且仅由于技术原因事务写入操作的限制是每个实体组每秒约 1 个写入操作,并且每个事务不超过 25 个实体组.

My understanding of transactions and entity groups is that NDB follows an optimistic concurrency control, so massive read ops from the same entity-group is possible (hence scalability), and due to technical reasons only for transactional write operations there is the limitation of ~ 1 write op per entity group per second, and not more than 25 entity groups per transaction.

然而,我的观察表明,读取操作也会导致过多争用错误.但是这个想法也让我感到困惑,因为如果您的目标是强一致性,它会使带有 Datastore 的 GAE 的可扩展性大大降低.所以这里可能还有其他事情发生.

However, my observations suggest that reading ops can also cause too much contention errors. But this idea also baffles me, because it would make GAE with Datastore much less scalable if you are aiming for strong consistency. So maybe there is something else going on here.

我在 SO 上发现了这条评论,声称我的假设是正确的:

I have found this comment on SO which claims that my assumption is right:

"注意:如果与访问同一实体组的其他事务存在冲突,则在 XG 事务中首次读取实体组可能会引发 TransactionFailedError 异常.这意味着即使是仅执行读取的 XG 事务也可能失败并发异常."

"Note: The first read of an entity group in an XG transaction may throw a TransactionFailedError exception if there is a conflict with other transactions accessing that same entity group. This means that even an XG transaction that performs only reads can fail with a concurrency exception."

来源:Google App Engine 中的争用问题

我能够在新文档中找到引用,现在位于 被取代的存储解决方案 > Cloud Datastore 的数据库客户端库 > 概览

I was able to find the quote in the new docs, now under Superseded Storage Solutions > DB Client Library for Cloud Datastore > Overview

引用的语句是否仍然适用于 NDB(或仅适用于 DB 和/或版本冲突)?

Is the quoted statement still true for NDB (or only for DB and/or for version conflicts)?

如果为真:建议使用什么模式来避免跨实体组的事务性读取的争用错误?

If it is true: What pattern would be recommended to avoid the contention error with transactional reads across entity groups?

推荐答案

在至少有一次写入的事务中,在这种情况下Key(Target, Foo),Cloud Datastore 将不写入-op 标记到已读取但未修改的实体组.这是为了确保可序列化.

In a transaction where there is at least one write, in this case Key(Target, Foo), Cloud Datastore will write no-op markers to the entity groups that are read but not modified. This is to ensure serializability.

由于 Key(OriginGroup, 1) 并且您在很长一段时间内以每秒 1 次以上的速度进行 XG 交易,这就是我们争论的根源.

Since Key(OriginGroup, 1) and you are doing XG transactions faster than 1 per second over an extended period, this is the source of our contention.

要考虑的一种替代方法是一次写入 23 个 Key(Target, Foo) 实体而不是一个的批处理策略.Key(Origin, 1)Key(TargetConfig, 1) 占用另外 2 个实体组槽.

One alternative to consider is a batching strategy that writes 23 Key(Target, Foo) entities at a time rather than one. Key(Origin, 1) and Key(TargetConfig, 1) takes the other 2 entity-group slot.

这篇关于从数据存储区读取(跨组)实体时发生 TransactionFailedError(争用过多...)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆