两个 SQL 语句应该返回相同的结果,但它们没有(在 AWS Aurora DB 上) [英] Two SQL statements should return the same results, but they don't (on AWS Aurora DB)

查看:25
本文介绍了两个 SQL 语句应该返回相同的结果,但它们没有(在 AWS Aurora DB 上)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是GpsPosition的表定义:

CREATE TABLE GpsPosition 
(
    altitudeInMeters SMALLINT NOT NULL,
    dateCreated      BIGINT NOT NULL,
    dateRegistered   BIGINT NOT NULL,
    deviceId         BINARY(16) NOT NULL,
    emergencyId      BINARY(16) NULL,
    gpsFix           SMALLINT NOT NULL,
    heading          SMALLINT NOT NULL,
    horizontalUncertaintyInMeters SMALLINT NOT NULL,
    id               BINARY(16) NOT NULL,
    latestForDevice  BOOLEAN NOT NULL,
    latestForUser    BOOLEAN NOT NULL,
    latitude         DOUBLE PRECISION NOT NULL,
    longitude        DOUBLE PRECISION NOT NULL,
    numSatellites    SMALLINT NOT NULL,
    speedInKmph      SMALLINT NOT NULL,
    stale            BOOLEAN NOT NULL,
    userId           BINARY(16) NULL,
    verticalUncertaintyInMeters SMALLINT NOT NULL,

    PRIMARY KEY (id)
);

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_deviceId_fkey 
        FOREIGN KEY (deviceId) REFERENCES Device(id) 
            ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_emergencyId_fkey 
        FOREIGN KEY (emergencyId) REFERENCES Emergency(id) 
            ON UPDATE CASCADE ON DELETE SET NULL;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_userId_fkey 
        FOREIGN KEY (userId) REFERENCES User(id) 
            ON UPDATE CASCADE ON DELETE SET NULL;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT deviceId_dateCreated_must_be_unique 
        UNIQUE (deviceId, dateCreated);

CREATE INDEX i2915035553 ON GpsPosition (deviceId);
CREATE INDEX deviceId_latestForDevice_is_non_unique ON GpsPosition (deviceId, latestForDevice);
CREATE INDEX i3210815937 ON GpsPosition (emergencyId);
CREATE INDEX i1689669068 ON GpsPosition (userId);
CREATE INDEX userId_latestForUser_is_non_unique ON GpsPosition (userId, latestForUser);

注意GpsPosition中的userId是一个UUID,存储为binary(16).

Note that userId in GpsPosition is a UUID that is stored as a binary(16).

此 SQL 代码在 AWS AuroraDB 引擎版本 5.7.12 上执行.

This SQL code is executing on AWS AuroraDB engine version 5.7.12.

我希望下面的查询返回相同的结果,但第一个返回许多结果而第二个不返回任何结果.知道为什么吗?

I would expect the queries below to return the same results, but the first one returns many results and the second returns no results. Any idea as to why?

select *
from GpsPosition
where exists (select *
              from User
              where id = GpsPosition.userId and
                    id = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' )
             );

select *
from GpsPosition
where userId = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' );

请注意,如您所料,以下 SQL 语句返回单行:

Note that the following SQL statement returns a single row, as you would expect:

select *
from User 
where id = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' );

推荐答案

所以我的团队花了几个月的时间试图理解这个问题和许多其他不一致的地方(比如这篇文章中的这个)我们能够重现AWS Aurora DB 5.7,但无法在 MySQL 5.7 或其他任何相关内容上重现.

So my team has spent literally a couple of months trying to understand this issue and many other inconsistencies (like this one in this posting) we were able to reproduce on AWS Aurora DB 5.7 but unable to reproduce on MySQL 5.7 or anything else for that matter.

作为这项工作的一部分,我们聘请了 AWS 支持,这非常没有帮助.他们确认他们可以通过执行我们在同一个数据库上执行的相同查询来重现不一致,但随后表示他们无法将该数据复制到另一个数据库并仍然重现该问题,这似乎让他们满意以标记支持案件已解决.现在承认,这是一个非常隐蔽的缺陷,因为它很难复制,而且如此间歇性和罕见,但是当它被击中时,它在受影响的数据集中变得可靠地可复制.一旦你遇到了这个缺陷,那么你依赖于数据库的应用程序就不能再在那些受影响的区域正确运行;)

As a part of this effort, we engaged AWS support, which was remarkably unhelpful. They confirmed they could reproduce the inconsistencies by executing the same queries we did on the same database we did, but then said they couldn't copy that data to another database and still reproduce the issue, and this seemed to satisfy them to mark the support case as resolved. Now granted, this is a very insidious defect since it is so difficult to reproduce and so intermittent and rare, but when it is hit, it becomes reliably reproducible within the affected data set. And once you do hit this defect, well, your applications depending on the database can no longer operate correctly in those affected areas ;)

虽然我们不认为该缺陷仅限于级联删除,但似乎更可靠"产生此缺陷的一种方法是删除具有级联删除的表中的行.同样,这似乎更可靠"地产生了缺陷,但即便如此,它也非常罕见且难以产生.然而,我们可以通过在紧密循环中运行一个巨大的自动化测试套件来生成它.同样,一旦您真的发现了这个缺陷,受影响的数据就会可靠地重现不一致 - 很难找到这个缺陷.

While we do not believe the defect is limited to cascade deletes, it appears that a way to "more reliably" produce this defect is to delete rows in tables that have cascade deletes. Again, this appears to produce the defect "more reliably", but even then, it is incredibly rare and difficult to produce. We could produce it by running a huge automated test suite in a tight loop however. Again, once you actually do hit this defect, the affected data will reliably reproduce inconsistencies - it's just VERY hard to hit this defect.

那么在我们所有的分析结束时我们得出了什么结论?

So what conclusions did we draw at the end of all of our analysis?

1) 首先,Thorsten Kettner(见他上面发表的评论)是正确的——这是 RDBMS 服务器本身的一个缺陷.我们无权访问 AWS AuroraDB 源代码或底层基础设施,因此我们无法将这个缺陷归结为更具体的原因,但它可能是 RDBMS 服务器中的缺陷,也可能是数据持久层中的缺陷,也可能是其他地方.

1) First and foremost, Thorsten Kettner (see his posted comment above) is correct - this is a defect in the RDBMS server itself. We don't have access to the AWS AuroraDB source code or underlying infrastructure, and so we cannot root cause this defect to something much more specific, but it is a defect possibly in the RDBMS server, possibly in the data persistence layer, and possibly somewhere else.

2) 基于上述 (1),我们认为 AWS Amazon 5.7.x 还不够成熟,无法用于生产应用程序.即使它在 99.9999% 的时间里都能正常工作,那 0.0001% 却导致开发和生产数据库服务器做错事并返回错误的结果,这对我们来说是绝对不能接受的.我们还检测到表上的完整性约束没有得到可靠遵守的情况,导致非常奇怪的孤立行,这些行应该作为架构定义中级联删除的一部分被删除,这同样是我们绝对不能接受的.

2) Based upon (1) above, we decided that AWS Amazon 5.7.x is not mature enough for us to use for a production application. Even though it works correctly 99.9999% of the time, that 0.0001% was causing development and production database servers to do the wrong things and return incorrect results, which is absolutely unacceptable to us. We also detected cases where integrity constraints on the tables were not reliably honored, resulting in very strange orphaned rows that should have been deleted as a part of cascade deletes in the schema definition, which again, is absolutely unacceptable to us.

3) 我们无法在 AWS MySQL 5.6、AWS MySQL 5.7、兼容 MySQL 5.6 的 AWS AuroraDB、非 AWS Windows MySQL 5.6 或非 AWS MySQL 5.7 上重现任何这些不一致.简而言之,我们认为出现问题的原因是与 MySQL 5.7 兼容的 AWS AuroraDB.我们对 AWS AuroraDB 特别是兼容 MySQL 5.6 进行了大量测试,无法重现这些不一致的缺陷,因此我们认为此时兼容 MySQL 5.6 的 AuroraDB 已经成熟,适合生产使用.

3) We were unable to reproduce any of these inconsistencies on AWS MySQL 5.6, AWS MySQL 5.7, AWS AuroraDB with MySQL 5.6 compatibility, non-AWS Windows MySQL 5.6, or non-AWS MySQL 5.7. In short, we believe that whatever is going wrong is specific to AWS AuroraDB with MySQL 5.7 compatibility. We did extensive testing on AWS AuroraDB with MySQL 5.6 compatibility in particular and could not reproduce any of these inconsistency defects, so we believe at this time that AuroraDB with MySQL 5.6 compatibility is mature and suitable for production use.

这篇关于两个 SQL 语句应该返回相同的结果,但它们没有(在 AWS Aurora DB 上)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆