如何有效地在SQL数据库中的版本记录 [英] How to efficiently version records in an SQL database

查看:113
本文介绍了如何有效地在SQL数据库中的版本记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在至少一个应用程序中,我需要在关系数据库中保留旧版本的记录。当应该更新某个内容时,将添加新的副本,并且旧行将被标记为不是当前的。当应删除某个内容时,应将其标记为不是当前内容或已删除。

In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and the old row would be marked as not current. When something should be deleted, it should instead be marked as not current or deleted.

这里有一个简单的用例:只能添加新版本的记录在当前时间,每个取代一行。这可用于在保存新数据时归档以前的记录。为此,我将向每个表中添加以下列:

There is a simple use case of this: New versions of a record can only be added at the current time, superseding one row each. This can be used for archiving previous records when saving new data. For this, I'd add the following columns to each table:

VersionTime datetime -- Time when this versions becomes effective
IsCurrent bool -- Indicates whether this version is the most current (and not deleted)

好,如果你只需要知道一个记录的最新版本是什么,并且只分别枚举单个记录的以前版本。时间点查询比第二个变量更加痛苦。

This is good if you only need to know what the most current version of a record is, and only enumerate previous versions of a single record separately. Point-in-time queries are even more painful than with the second variant.

一个更通用的变体是:可以随时添加任何指定的记录版本有效时间范围。因此,我可以声明一个实体的某些设置在2013年年底有效,另一个版本在2014年有效,而另一个版本将从2015年起生效。这可以用于存档旧数据(如上所述),并提前计划在将来某个时间使用不同的数据(并将此信息保存为存档)。为此,我将向每个表中添加以下列:

A more generic variant is this: Versions of records can be added at any time for any specified validity time range. So I could declare that some setting of an entity is valid until end of 2013, and another version of it is valid in 2014, and yet another version will be valid from 2015 on. This can be used to both, archive old data (as above), and plan ahead to use different data at some time in the future (and to keep this information as an archive). For this, I'd add the following columns to each table:

ValidFrom datetime -- Time when this version becomes valid (inclusive)
ValidTo datetime -- Time when this version becomes invalid (exclusive)

第二种方法可以基本上代表第一个,但很难知道什么版本是最新的 - 因为你也可以添加版本的未来。此外,ValidFrom / ValidTo设计能够声明重叠范围,根据定义,具有最高ValidFrom的行将适用于这种情况。

The second approach can basically represent the first as well, but it's harder to know what version is the most recent - because you can also add versions for the future. Also, the ValidFrom/ValidTo design is able to declare overlapping ranges, and by definition, the row with the highest ValidFrom shall apply in that case.

现在我想知道如何实施有效的解决方案来管理和查询此类数据。通常你可以用任何类型的WHERE,GROUP BY和JOIN写任何SQL查询来获得你想要的记录。但是,应用版本控制后,您需要考虑每个记录的正确版本。因此,不是从另一个表加入每个版本的记录,必须添加一个适当的条件,以便只选择在给定时间有效的版本。

Now I'm wondering how to implement an efficient solution to manage and query such data. Normally you can just write any SQL queries with any kind of WHERE, GROUP BY and JOIN to get the records you want. But with versioning applied, you need to consider the correct version of each record. So instead of joining every version of a record from another table, an appropriate condition must be added to only select the version that is valid at a given time.

例如:

SELECT a, b, c
FROM t1

必须更改为:

SELECT a, b, c
FROM t1
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
ORDER BY t1.ValidFrom
LIMIT 1

表连接:

SELECT a, b, c
FROM t1
    LEFT JOIN t2 ON (t2.a = t1.a)

必须更改为:

SELECT a, b, c
FROM t1
    LEFT JOIN t2 ON (t2.a = t1.a)
WHERE t1.ValidFrom <= :time AND t1.ValidTo > :time
    AND t2.ValidFrom <= :time AND t2.ValidTo > :time

这仍然不能处理选择正确版本的重叠时间跨度。我可以添加一些清理方法,展开重叠的版本时间范围,但我不知道如何有效率。

This still doesn't handle selecting the right version of overlapping time spans. I could add some clean-up method that flattens out overlapping version time ranges, but I don't know how efficient that would be.

我想创建一个类(在我的情况下,在C#),提供读取和写这种版本化记录的方法。写入相对容易,因为查询是简单的,并且易于用事务控制。但是查询将需要构建一个接受SQL SELECT查询的每个片段的API,并智能地构建SQL查询以从中执行。您的查询方法应该只接受一个额外的参数,指定从中获取数据的时间。根据每个实体的有效范围,将选择每个实体的不同版本。

I'm seeking to create a class (in C# in my case) that provides methods to read and write such versioned records. The writing is relatively easy because the queries are simple and easy to control with transactions. but querying would require building an API that accepts every fragment of an SQL SELECT query and intelligently builds the SQL query to execute from that. Thie query method should only accept one additional parameter that specifies the time to fetch the data from. Depending on each entity's validity range, different versions would be selected of each.

这些基本上是我对版本控制数据的不完全想法,并提供了一个API来管理它。你已经做了这样的事情,并想告诉我你的想法吗?你有另一个想法工作得很好吗?你能给我提供如何实现这个API的任何建议吗?虽然我理论上知道如何做,我认为这是很多工作,我不能估计它将如何工作。

These are basically my incomplete thoughts about versioning data and providing an API to manage it. Have you already done such a thing and would like to tell me what you think of it? Do you have another idea that worked well? Could you offer me any advice on how to implement this API? While I theoretically know how to do it, I think it's a lot of work and I can't estimate how well it will work.

推荐答案

如果您需要旧数据作为业务逻辑的一部分,则:

If you need old data being part of your business logic then:


  • 版本在主表中(插入和更新,删除只会更改状态列)

  • 在详细信息表中更新时创建快照(在创建快照之前) / li>
  • Save latest version in master table.(insert and update, delete will just change the status column)
  • Take snapshot when an update happens in detail table(before any update an snapshot will be created).

  • Another alternative will be Event Sourcing pattern.

如果旧数据

  • An Entity–attribute–value approach may come in handy. An implementation sample can be found here.

这篇关于如何有效地在SQL数据库中的版本记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆