没有历史表的数据库版本控制 [英] Database versioning without history tables

查看:97
本文介绍了没有历史表的数据库版本控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过发布记录级版本化表。我注意到这个架构涉及历史表的使用。但是,我的场景不需要回滚,而是检索时间记录。这是我尝试使用单个表进行版本控制的设计。请注意,这是一个裸骨表数据(无约束,索引等)。我打算根据id进行索引,因为这涉及到列的group by子句。



例如,我有一个表Test在哪里


id是标识符,



modstamp是数据的时间戳(从不为空)


除上述列外,该表还将包含记帐列


local_modstamp是更新记录的时间戳记



del_modstamp是删除记录的时间戳。


在备份过程中,所有记录都是从源代码获取的,并插入到记录中的值为local_modstamp = null和del_stamp = null。

  id | modstamp | local_modstamp | del_modstamp | 
--- | --------------------------- | -------------- - | ------------- |
1 | 2016-08-01 15:35:32 +00:00 | | |
2 | 2016-07-29 13:39:45 +00:00 | | |
3 | 2016-07-21 10:15:09 +00:00 | | |

一旦获得记录,这些是处理数据的场景(假设参考时间 [ref_time] 是运行流程的时间):


  1. 正常插入。 p>


  2. 更新:使用local_modstamp = ref_time更新最近的记录。然后插入新记录。
    查询将是:
    更新测试集local_modstamp =其中id =和local_modstamp不为null并且del_modstamp不为空
    插入测试值(...)


  3. 删除:使用del_modstamp = ref_time更新最新的记录。
    更新测试集del_modstamp =其中id =和local_modstamp不为null,del_modstamp不为null


设计旨在获取最新的记录,其中local_modstamp不为空,并且del_modstamp不为空。
但是,我遇到一个问题,我打算使用查询(最内层查询)来检索时间:

  select id,max(modstamp)from test where modstamp< =< ref_time>和(del_modstamp为null || del_modstamp< =< ref_time>)group by id; 

似乎我使用null作为占位符来识别错误(有)表的最新记录。有没有办法使用现有的设计获取时间记录?



如果没有,我猜可能的解决办法是将local_modstamp设置为最新的记录。这将需要在更新的情况下使用max(local_modstamp)更新逻辑。我可以坚持我现有的架构来实现检索时间点数据吗?



我现在使用的是SQL Server,但是这个设计可能会扩展到其他数据库产品太。我打算使用更一般的方法来检索数据,而不是使用供应商特定的黑客

介绍版本正常格式。考虑这个表:

 创建表实体(
ID int identity主键,
S1 [type] , - 静态数据
Sn [type], - 更多静态数据
V1 [type], - 易失性数据
Vn [type] - 更易变数据
) ;

静态数据是在实体的生命周期内不会更改的数据,也不需要跟踪。



将volatile属性移动到单独的表中:

 创建表EntityVersions(
ID int not null,
生效日期不为null默认sysdate(),
删除位不为null默认值0
V1 [类型],
Vn [type],
约束PK_EntityVersions主键(ID,有效),
约束FK_EntityVersionEntity外键(ID)
引用实体(ID)
);

Entities表不再包含volatile属性。



插入操作使用静态数据创建主实体记录,生成唯一的ID值。该值用于将第一个版本与volatile数据的初始值进行插入。更新通常对主表不起作用(除非实际更改了静态值),并将新版本的易失性数据写入版本表。注意,对现有版本,特别是最新版本或当前版本不做任何更改。新版本被插入,操作结束。



要撤销最新版本,或任何版本实际上只需从版本表中删除该版本。



例如,具有以下属性的Employees表:

  EmployeeNum,HireDate ,FirstName,LastName,PayRate,Dept,PhoneExt 

EmployeeNum当然会与HireDate一起静态和名字。 PhoneExt可能会不时更改,但我们不在乎。所以它被指定为静态。最终的设计是:

  Employees_S 
===========
EmployeeNum (PK),HireDate,FirstName,PhoneExt

Employees_V
===========
EmployeeNum(PK),Effective(PK),IsDeleted,LastName ,PayRate,Dept

在2016年1月1日,我们聘请了Sally Smith。静态数据插入到Employees_S中,生成EmployeeNum值为1001.我们使用该值也插入第一个版本。

  Employees_S 
===========
1001,yesterday1-01,Sally,12345

Employees_V
======= == $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ p> 3月1日,她得到加薪:

  Employees_S 
========= ==
1001,2016-01-01,Sally,12345

Employees_V
===========
1001,2016-01 -01,0,Smith,35.00,Eng
1001,2016-03-01,0,Smith,40.00,Eng

5月1日,她结婚:

  Employees_S 
===== ======
1001,2016-01-01,Sally,12345

Employees_V
===========
1001 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,Eng

请注意,同一实体的版本,除了生效日期不能是相同的,完全是彼此独立的。



要查看员工1001的当前状态如何,以下是查询:

  select s.EmployeeNum,s.HireDate,s.FirstName,v.LastName,v.PayRate,v.Dept,s.PhoneExt 
from Employees_S s
加入Employees_V v
on v.EmployeeNum = s.EmployeeNum
and v.Effective =(select Max(Effective)
from Employees_V
Where EmployeeNum = v.EmployeeNum
和Effective< = SysDate())
其中s.EmployeeNum = 1001
和v.IsDeleted = 0;

这是很酷的部分。要查看员工1001的状态,比如说2月11日,这是查询:

  select s.EmployeeNum, s.HireDate,s.FirstName,v.LastName,v.PayRate,v.Dept,s.PhoneExt 
from Employees_S s
join Employees_V v
on v.EmployeeNum = s.EmployeeNum
和v.Effective =(从Employees_V
中选择Max(有效)
其中EmployeeNum = v.EmployeeNum
和Effective< ='07-02-11)
其中s.EmployeeNum = 1001
和v.IsDeleted = 0;

它是相同的查询 - 除了子查询的最后一行。当前和历史数据位于同一个表中,并使用相同的语句进行查询。



这是另一个很酷的功能。 7月1日,我们知道,9月1日,Sally将转移到营销部门,另外加薪。文书工作已经过去了。继续插入新数据:

  Employees_S 
===========
1001,2016-01-01,Sally,12345

雇员_V
===========
1001, ,Smith,35.00,Eng
1001,2016-03-01,0,Smith,40.00,Eng
1001,dressing-01,0,Jones,40.00,Eng
1001, 2016-09-01,0,Jones,50.00,Mkt

下一个到最后一个版本将仍然显示为当前版本,但在9月1日或之后执行的第一个查询将显示营销数据。



这里是我在技术交易会上做了几次的演示文稿的幻灯片。它包含有关如何完成上述所有内容的详细信息,包括查询。而 here 是一个更详细的文档。


I was going through this post for record level versioning of tables. I noticed that the architecture deals with the usage of history tables. However, my scenario does not require rollback but retrieving point in time records. This is where I have tried with a design on using a single table for versioning. Note that this is a bare bone table data (no constraints, indices, etc.). I intend to index based on id since this involves group by clause on the column.

For example, I have got a table Test where

id is the identifier,

modstamp is the timestamp of the data (never null)

In addition to the columns above, the table will contain bookkeeping columns

local_modstamp is the timestamp at which the record was updated

del_modstamp is the timestamp at which the record was deleted

During backup, all the records are obtained from the source and inserted where the records would have the values local_modstamp = null and del_stamp = null.

id |modstamp                   |local_modstamp |del_modstamp |
---|---------------------------|---------------|-------------|
1  |2016-08-01 15:35:32 +00:00 |               |             |
2  |2016-07-29 13:39:45 +00:00 |               |             |
3  |2016-07-21 10:15:09 +00:00 |               |             |

Once the records are obtained, these are the scenarios for handling the data (assuming the reference time [ref_time] is the time at which the process is run):

  1. Insert as normal.

  2. Update: Update the most recent record with local_modstamp = ref_time. Then insert the new record. The query would be: update test set local_modstamp = where id = and local_modstamp is not null and del_modstamp is not null insert into test values(...)

  3. Delete: Update the most recent record with del_modstamp = ref_time. update test set del_modstamp = where id = and local_modstamp is not null and del_modstamp is not null

The design aims at getting the latest records where local_modstamp is not null and del_modstamp is not null. However, I ran into an issue where I intend to retrieve point in time using the query (inner-most query):

select id, max(modstamp) from test where modstamp <= <ref_time> and (del_modstamp is null || del_modstamp <= <ref_time>) group by id;

It seems that I have made a mistake (have I?) of using null as a placeholder to identify the latest records of the table. Is there a way to use the existing design to obtain the point in time records?

If not, I guess the probable solution is to set the local_modstamp to the latest records. This would require to update the logic using max(local_modstamp) in case of updates. Can I persist on my existing architecture to achieve in retrieving the point in time data?

I am using SQL-Server right now but this design may be extended to other database products too. I intend to use a more general approach to retrieve the data instead of using vendor specific hacks.

解决方案

Introducing Version Normal Form. Consider this table:

create table Entities(
    ID     int identity primary key,
    S1     [type],  -- Static data
    Sn     [type],  -- more static data
    V1     [type],  -- Volatile data
    Vn     [type]   -- more volatile data
);

Static data is data that does not change during the lifetime of the entity or that doesn't require tracking. Volatile data changes and those changes must be tracked.

Move volatile attributes to a separate table:

create table EntityVersions(
    ID        int  not null,
    Effective date not null default sysdate(),
    Deleted   bit  not null default 0,
    V1        [type],
    Vn        [type],
    constraint PK_EntityVersions primary key( ID, Effective ),
    constraint FK_EntityVersionEntity foreign key( ID )
        references Entities( ID )
);

The Entities table no longer contains the volatile attributes.

An insert operation creates the master entity record with static data, generating the unique ID value. That value is used to insert the first version with the initial values of the volatile data. An update generally does nothing to the master table (unless a static value is actually changed) and a new version of the new volatile data is written to the version table. Note no changes are made to existing versions, particularly the latest or "current" version. The new version is inserted, end of operation.

To "undo" the latest version, or any version actually, simply delete that version from the version table.

For example, an Employees table with the following attributes:

EmployeeNum, HireDate, FirstName, LastName, PayRate, Dept, PhoneExt

EmployeeNum will, of course, be static along with HireDate and FirstName. PhoneExt may change from time to time but we don't care. So it is designated static. The final design is:

Employees_S
===========
  EmployeeNum (PK), HireDate, FirstName, PhoneExt

Employees_V
===========
  EmployeeNum (PK), Effective (PK), IsDeleted, LastName, PayRate, Dept

On 1 Jan, 2016 we hired Sally Smith. The static data is inserted into Employees_S generating an EmployeeNum value of 1001. We use that value to also insert the first version.

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng

On 1 Mar, she gets a pay raise:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng

On 1 May, she gets married:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng
  1001, 2016-05-01, 0, Jones, 40.00, Eng

Note that versions of the same entity, other than the restriction that the Effective dates cannot be the same, are completely independent of each other.

To see what the current state of employee 1001 looks like, here is the query:

select  s.EmployeeNum, s.HireDate, s.FirstName, v.LastName, v.PayRate, v.Dept, s.PhoneExt
from    Employees_S s
join    Employees_V v
    on  v.EmployeeNum = s.EmployeeNum
    and v.Effective = ( select  Max( Effective )
                        from    Employees_V
                        where   EmployeeNum = v.EmployeeNum
                            and Effective <= SysDate() )
where   s.EmployeeNum = 1001
    and v.IsDeleted = 0;

Here's the cool part. To see what the state of employee 1001 looked like on, say 11 Feb, here is the query:

select  s.EmployeeNum, s.HireDate, s.FirstName, v.LastName, v.PayRate, v.Dept, s.PhoneExt
from    Employees_S s
join    Employees_V v
    on  v.EmployeeNum = s.EmployeeNum
    and v.Effective = ( select  Max( Effective )
                        from    Employees_V
                        where   EmployeeNum = v.EmployeeNum
                            and Effective <= '2016-02-11' )
where   s.EmployeeNum = 1001
    and v.IsDeleted = 0;

It's the same query -- except for the last line of the subquery. Current and historical data reside in the same table and are queried with the same statement.

Here's another cool feature. It's 1 Jul and we know that on 1 Sep, Sally is going to transfer to the marketing dept, with another pay raise. The paperwork has already gone through. Go ahead and insert the new data:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng
  1001, 2016-05-01, 0, Jones, 40.00, Eng
  1001, 2016-09-01, 0, Jones, 50.00, Mkt

The next-to-last version will still show up as the current version but the first query executed on or after 1 Sep will show the Marketing data.

Here are the slides of a presentation I have made a few times at tech fairs. It contains more details about how all the above can be done including the queries. And here is a document that goes into a lot more detail.

这篇关于没有历史表的数据库版本控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆