使用PostgreSQL MVCC跨多个表进行事务隔离 [英] Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

查看:72
本文介绍了使用PostgreSQL MVCC跨多个表进行事务隔离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题摘要



这是一个有关SQL事务中查询可序列化性的问题。



具体来说,我正在使用PostgreSQL。可以假定我使用的是最新版本的PostgreSQL。从我的阅读中,我相信用于支持我正在尝试做的事情的技术被称为 MultiVersion并发控制或 MVCC。



总结起来:如果我有一个主表,并且有超过1个与该主表连接的外键链接表,那么如何保证对于表中的给定键以及使用该表的任意数量的SELECT语句

其他问题



此问题类似,但范围更广,问题和答案与PostgreSQL无关:
在SQL Server Express和SQL Server 2005上,事务隔离和从多个表中读取



示例



说我有3张桌子:

 砖块
brickworks(主键)
complete_time(主键)
has_been_sold

brick_colors
brickworks(主键,外键指向砖块)
完成时间(主键,外键指向砖块)
象限(主键)
颜色

brick_weight
brickworks(主键,外键指向砖)
complete_time(主键,外键指向砖)
重量

砖厂一次只能生产一块砖。



后来有人分析了这些砖块以确定它们的颜色组合,并将结果写到brick_colors表中



其他人会分析积木以确定其重量,并将结果写入brick_weight表。



在任何给定的时间,现有的砖可能有或没有记录的颜色,并且可能有也可能没有记录的重量。






存在一个应用程序,并且此应用程序收到有人要购买特定砖块的消息(此时已通过其Brickworks / completion_time复合键知道该应用程序)。



应用程序希望在启动查询的确切时间选择砖的所有已知属性。



如果添加了颜色或重量信息,则MID-TRANSACTION应用程序不想知道它。



应用程序想要执行单独的查询(不是对与外键链接的表具有多个JOIN的SELECT,由于使用brick_colors表可能会返回多个行)。






这个例子故意很简单;如果我的示例中包含10个外键链接表,并且其中许多或全部表可以为同一主键返回多个行(如brick_colors在该示例中所做的操作),则无需使用多个JOIN进行SELECT的情况下,执行此操作的愿望就会更加清楚。



尝试的解决方案



这是到目前为止我要提出的内容:

  BEGIN交易隔离级别可序列化; 

-这条语句完成的全部工作就是告诉数据库在事务中将来查询中应从当前时间点返回哪些行
SELECT DISTINCT true
FROM bricks b
左连接brick_colors bc在bc.brickworks = b.brickworks和bc.completion_time = b.completion_time
左连接brick_weight bw在bw.brickworks = b.brickworks和bw.completion_time = b.completion_time
WHERE b.brickworks ='Brick-o-Matic'AND b.completion_time ='2017-02-01T07:35:00.000Z';

选择*从brick_colors其中b.brickworks ='Brick-o-Matic'和b.completion_time ='2017-02-01T07:35:00.000Z';
SELECT * FROM brick_weight WHERE b.brickworks = Brick-o-Matic AND b.completion_time = 2017-02-01T07:35:00.000Z;

COMMIT;

将第一个SELECT与JOIN一起使用只是为了确保可序列化性,这似乎是浪费。 p>

还有其他方法吗?



参考



PostgreSQL并发控制



PostgreSQL交易隔离



PostgreSQL SET TRANSACTION语句

解决方案

这是您问题的本质:


我如何保证,对于......一次交易中的任何数量的SELECT语句
.....我我开始时会以
的价格获得数据交易







这正是可重复读取隔离级别保证:


仅可重复读隔离级别可以看到
事务开始之前提交的数据;它永远不会看到未提交的数据或并发
事务在事务执行期间提交的
更改。
(但是,查询的确看到了先前
更新在其自身事务中执行的效果,即使它们尚未提交
。)这比
SQL标准对此隔离级别所要求的有力保证,并且可以防止表13-1中描述的所有
现象。 。如上所述,这是标准明确允许的
,仅描述每个隔离级别必须提供的最低
保护。



此级别为与Read Committed的区别在于
可重复读取事务中的查询在
事务开始时看到快照,而不是在
中当前查询开始时看到快照交易。因此,单个
事务中的连续SELECT命令会看到相同的数据,即它们看不到
其他事务在其自己的事务开始之后提交的更改。







一个实际例子-假设我们有2个简单的表:

 创建表t1(x int); 
插入t1值(1),(2),(3);
创建表t2(y int);
插入t2值(1),(2),(3);

许多表,它们的结构,主键,外键等在这里都不重要。


让我们打开第一个会话,开始可重复的读取隔离级别,并运行两个简单且独立的SELECT语句:

  test =#开始事务隔离级别可重复读取; 
开始交易
test =#SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)


test =#选择*从t2;
y
---
1
2
3
(3 wiersze)

请注意, START TRANSACTION 命令会自动禁用会话中的自动提交模式。






现在在另一个会话中(启用默认的自动提交模式)将一些记录插入 t1

  test2 =#插入t1值(10),(11); 

插入新值并自动提交(因为启用了自动提交)。






现在回到第一个会话并再次运行SELECT:
test =#select * from t1;

  x 
---
1
2
3
(3 wiersze)

如您所见,session1(具有有效的可重复读事务)在启动后看不到任何更改






让我们做相同的实验白表 t2 -转到第二个会话并发出:

  test2 =#从t2处删除,其中y = 2; 
删除1

现在返回第一个会话并再次运行SELECT:

  test =#SELECT * FROM t2; 
y
---
1
2
3
(3 wiersze)

再次看到,会话1(具有有效的可重复读取事务)在转换开始后看不到任何更改。






现在,在会话1中,完成发出COMMIT的事务,然后选择:

  test =#SELECT * FROM t1; 
x
---
1
2
3
(3 wiersze)

test =#SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)

test =#COMMIT;
COMMIT

test =#从t1选择*;
x
----
1
2
3
10
11
(5 wierszy)


test =#从t2选择*;
y
---
1
3
(2 wiersze)

如您所见,当可重复读事务启动并处于活动状态时,您可以多次运行许多单独的select语句,并且所有这些select语句在开始时都看到相同的稳定数据快照。交易,并考虑其他会话中的所有提交数据。


Question Summary

This is a question about serializability of queries within a SQL transaction.

Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".

To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?

Other Questions

This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL: Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005

Example

Let's say I have 3 tables:

bricks
    brickworks (primary key)
    completion_time (primary key)
    has_been_sold

brick_colors
    brickworks (primary key, foreign key pointing to "bricks")
    completion_time (primary key, foreign key pointing to "bricks")
    quadrant (primary key)
    color

brick_weight
    brickworks (primary key, foreign key pointing to "bricks")
    completion_time (primary key, foreign key pointing to "bricks")
    weight

A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.

Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.

Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.

At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.


An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).

The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.

If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.

The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).


This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).

Attempted Solution

Here's what I've come up with so far:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;

-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;

SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;

COMMIT ;

It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.

Is there any other way to do this?

References

PostgreSQL Concurrency Control

PostgreSQL Transcation Isolation

PostgreSQL SET TRANSACTION statement

解决方案

This is the essence of your question:

how do I guarantee that, for ...... any number of SELECT statements ..... inside one transaction ....... I will get data as it existed at the time I started the transaction?


This is exactly what Repeatable Read Isolation Level guarantees:

The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the query does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) This is a stronger guarantee than is required by the SQL standard for this isolation level, and prevents all of the phenomena described in Table 13-1. As mentioned above, this is specifically allowed by the standard, which only describes the minimum protections each isolation level must provide.

This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the transaction, not as of the start of the current query within the transaction. Thus, successive SELECT commands within a single transaction see the same data, i.e., they do not see changes made by other transactions that committed after their own transaction started.


A practical example - let say we have 2 simple tables:

CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);

A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.

Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:

test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
 x
---
 1
 2
 3
(3 wiersze)


test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

Note that START TRANSACTION command automatically disables autocommit mode in the session.


Now in another session (with default autocommit mode enabled)insert a few records into t1:

test2=# INSERT INTO t1 VALUES(10),(11);

New values were inserded and automatically commited (because autocommit is on).


Now go back to the first session and run SELECT again: test=# select * from t1;

 x
---
 1
 2
 3
(3 wiersze)

As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.


Lets do the same experiment whit table t2 - go to the second session and issue:

test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1

Now go back to the first session and run SELECT again:

test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.


And now, in session1, finish the transaction issuing COMMIT, and then SELECT:

test=# SELECT * FROM t1;
 x
---
 1
 2
 3
(3 wiersze)

test=# SELECT * FROM t2;
 y
---
 1
 2
 3
(3 wiersze)

test=# COMMIT;
COMMIT

test=# select * from t1;
 x
----
  1
  2
  3
 10
 11
(5 wierszy)


test=# select * from t2;
 y
---
 1
 3
(2 wiersze)

As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

这篇关于使用PostgreSQL MVCC跨多个表进行事务隔离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆