哪个选项对于表的历史信息会更好? [英] Which option would be better for historical information of a table?

查看:126
本文介绍了哪个选项对于表的历史信息会更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表A存储用户可以更新的信息,但是由于用户需求,我需要跟踪信息中的更改。
考虑到:




  • 此信息将在用户想要

  • 这个信息可以随时更改,但不是经常更改,让我们每年说5次



我想到了一些选项,如:




  • 将所有记录(旧的和新的)存储在单个表中

  • 创建2个表A和B ,一个(A)只保留当前记录,另一个(B)与当前记录不一致。在这种情况下,我会用新的信息在B中插入,然后我会更新A。



我喜欢第二个选项比第一个更多,但我不知道第二个是真正的解决方案,或者只是一个花哨的方式来做,因为最终我正在存储相同数量的数据?



有没有人有其他选择,或者在开发时如何面对这种情况?



非常感谢! p>

Azu

解决方案

我的建议是将表转换为版本正常表单(vnf)。你说这个表包含用户可能会更新的数据。我推断,有一个独立的主表由静态数据组成,可更新表的PK也是独立表的FK。

  create table版本(
ID int not null,
ModDate date not null,
ModUserID int not null,
... ..., - 数据字段
约束PK_Versions主键(ID,ModDate),
约束FK_Versions_Primary外键(ID)
引用主(ID),
约束FK_Versions_User外键(ModUserID)
引用用户(ID)
);

版本控制不需要FK引用回到主要,我只是将其包含在其中。但它确实显示为什么我称之为版本正常形式。您将从静态数据归一化可变数据。这样,应用标准规范化技术。



大多数查询可能只对每个实体的当前版本感兴趣。当前版本是最新的版本 - 修改日期最多的版本。

 从版本中选择* 
v
其中v.ModDate =(
从版本v1
中选择Max(v1.ModDate)
其中v1.ID = v.ID);

不要让子查询担心你。我使用版本控制多年,查询速度非常快。



如果有一个主表,连接显示整个当前元组是基于上面的查询。

 选择p。*,v。*  - 您将要从Primary p $ b中扩展出
$ b加入版本v
v.ID = p.ID
和v.ModDate =(
从版本v1
中选择Max(v1.ModDate)
其中v1.ID = v.ID);

事实上,如果您查看第一个查询,则第二个查询可以加入视图。也不用担心加入简单的意见。如果您检查完整查询的执行计划并加入视图,那么它们应该是相同的。



您还可以从第二个查询中获取一个视图,仅暴露整个实体的当前版本。如果有大量数据 - 许多实体的许多版本 - 视图中的 select *将比仅由当前行组成的表的类似转储速度明显更慢。但是,如果您过滤数据 - select * from view where ID = 12345 - 结果应该相似。



但这里是这个设计的力量变得清晰的地方。假设你想知道过去某个特定点的实体版本。查询没有显着差异。考虑第一个查询:

 从版本v 
中选择*
其中v.ModDate =(
从版本v1
中选择Max(v1.ModDate)
,其中v1.ID = v.ID
和v1.ModDate <=:DateOfInterest);

只需添加和v1.ModDate< =:DateOfInterest 到子查询允许您及时回头查看数据在任何特定日期和时间的样子。



我的典型实现是仅显示每个实体的当前版本的当前视图和显示所有版本的历史视图。所有DML通过当前视图。 而不是触发器将每个操作转换为维护版本化数据所需的实际操作。例如,UPDATE将成为新版本的INSERT,当然,该版本将成为该实体的新的当前版本。


I have a table A where I store information that my users can update, but because of the users requirement I need to keep track of the changes in the info. Considering that:

  • This info will be display whenever the user wants to
  • The info could be changed anytime but not so often, let's say 5 times a year

I tought of some options like:

  • Store all the records (olds and new ones) in a single table
  • Create 2 tables A and B, one (A) that keeps just the current record, and another one (B) with the not current ones. In this case I would do an insert in B with the new information and then I would do an update to A.

I like the second option more than the first, but I'm not sure if the second is really the solution, or just a fancy way to do it, cause at the end I'm storing the same amount of data right?

Does anyone have other options, or how do you face this kind of situation when developing?

Thank you very much!

Azu

解决方案

My suggestion would be to convert the table to version normal form (vnf). You say this table contains data users may update. I infer from that there is an independent primary table consisting of static data with the PK of the updatable table also a FK to the independent table.

create table Versions(
    ID        int not null,
    ModDate   date not null,
    ModUserID int not null,
    ...     ..., -- data fields
    constraint PK_Versions primary key( ID, ModDate ),
    constraint FK_Versions_Primary foreign key( ID )
        references Primary( ID ),
    constraint FK_Versions_User foreign key( ModUserID )
        references Users( ID )
);

Versioning doesn't require the FK reference back to the primary, I just included it for illustration. But it does show why I call it "version normal form." You will be normalizing changeable data from the static data. This way, standard normalizing techniques apply.

Most queries will probably be only interested in the "current" version of each entity. The current version is the most recent one -- the one with the largest modification date.

select  *
from    Versions v
where   v.ModDate =(
        select  Max( v1.ModDate )
        from    Versions v1
        where   v1.ID = v.ID );

Don't let the subquery worry you. I have used versioning for years and the query is quite fast.

If there is a primary table, the join to show the entire current tuple is based on the query above.

select  p.*, v.* -- You will want to expand these out
from    Primary p
join    Versions v
    on  v.ID = p.ID
    and v.ModDate =(
        select  Max( v1.ModDate )
        from    Versions v1
        where   v1.ID = v.ID );

In fact, if you make a view of the first query, the second query could just join to that view. Also don't worry about joining to simple views. If you examine the execution plans of the full query and a join to the view, they should be the same.

You could also have a view made from the second query, exposing only the current versions of the entire entity. If there is a lot of data -- many versions of many entities -- a select * from view will be noticeably slower than a similar dump of a table consisting of only current rows. However, if you filter the data -- select * from view where ID = 12345 -- the results should be similar.

But here is where the power of this design becomes clear. Suppose you wanted to know the version of an entity at some particular point in the past. The query is not significantly different. Consider the first query:

select  *
from    Versions v
where   v.ModDate =(
        select  Max( v1.ModDate )
        from    Versions v1
        where   v1.ID = v.ID
            and v1.ModDate <= :DateOfInterest );

Just the addition of and v1.ModDate <= :DateOfInterest to the subquery allows you to look back in time to see what the data looked like on any particular date and time.

My typical implementation is to have a "current" view that shows only the current version of each entity and a "history" view which shows all versions. All DML goes thru the "current" view. An "instead of" trigger translates each operation into the actual operations needed to maintain the versioned data. For example, UPDATE would become an INSERT of a new version which would, of course, become the new current version for that entity.

这篇关于哪个选项对于表的历史信息会更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆