6NF和历史属性数据 [英] 6NF and historical attribute data

查看:154
本文介绍了6NF和历史属性数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用按照6NF原则归一化的数据库时,如何存储历史属性数据?



让我们举个例子:此示例来自@PerformanceDBA,但具有以下额外要求:


我们需要存储所有产品的历史数据,我们应该是
能够输入一个日期,并获得属性的快照
在特定时间的产品。



更实用的示例

假设上述示例中的磁盘和CPU都是虚拟的,磁盘容量随意。我们如何改变数据库,以便我们可以在过去的任何时间(当然是在创建日期之后)检索给定磁盘的属性,同时保持5NF视图足够快。




我正在考虑的事情




  • 添加时间戳记列' changedate <

  • 创建一个单独的*历史表( 对于每个属性表(可能会导致大量的表,因为我们有大约70个属性分布在20种产品类型

  • em>当前'列以加快5NF视图



>




编辑:我知道时态数据库的概念,但问题是对于数据库引擎'm working with(postgresql)的时间扩展还没有完全实现。关于如何实现这个没有时间数据库的任何建议?

解决方案

最近批准的SQL:2011标准包含允许您处理



不是说你能在时间领域做一切你想做的事情,但是确实是一个相当显着的改进。



关于它的一个好的演示是在http://metadata-standards.org/Document-library/Documents-by-number/WG2- N1501-N1550 / WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf



请注意,只有一个供应商对这些功能提供合理支持



还有一个时间数据讨论组,在他的SQL产品中,另一个可能正在处理它,并且第三个为他们的客户打开投票渠道。 www.linkedin.com。



编辑尝试解决没有时间数据库,如何实现这一点的任何建议?



不要向模型中添加单个日期/时间类型列。第一个原因是你提供的,第二个原因是这个解决方案也是新标准提出的解决方案,并且它将有助于过渡到支持新特性的引擎,一旦它们可用。



因此,请添加BOTH一个开始和结束日期/时间列。不要让它的任何一个都是空的。新标准要求它的时间特征。如果结束MIT(时刻)仍然未知,则使用适用时间类型的最高值,例如, 9999-12-31。



您不需要为每个属性创建单独的历史记录表。同样可能具有保持整个实体出现的历史的单个实体表。缺点是,很难查询某个特定属性发生了什么实际更改(因为您会为任何属性的任何更改获取新的历史记录行,可能会复制大多数属性的相同属性值)的属性)。 单个表可能是一个渴望消费者的空间,每个属性的单独历史可能是一个渴望查询CPU时间的消费者。



不要向表中添加索引的当前列。这是一个平衡的行为,其中的平衡恰好取决于您的具体情况。首先,他们不会帮助你转换到新的功能,当你的引擎有他们,第二,Y / N列是非常差的鉴别器,因此非常差的候选人索引。我宁愿添加你的开始或结束mit索引,他们可以预期给你相同的胜利为当前行,并更好地赢得非当前行,每当你需要查询



对于数据库约束的实施,例如时间键中的时间段不重叠以及在时间RI中包含时间段,你完全可以你自己。



这是否更有帮助?


如果您想在触发器或SPROC或应用程序代码中编写您需要的代码,

When using a database normalized accoring to 6NF principles, how would you store historical attribute data?

Let say for example we take this example from @PerformanceDBA but with the following extra requirement:

We need to store historical data for all our products, we should be able to just enter a date and get a snapshot of the attributes of the product at that particular time.

A more practical example:
Suppose the disks and CPU's from the example above are virtual and a user can change the disk capacity at will. How can we alter the database so that we can retrieve the attributes of a given disk at any time in the past (of course after it's creation date) while keeping the 5NF view fast enough.

Things I'm considering

  • Add a timestamp column 'changedate' to each attribute table (this would result in a pretty complex query with a subquery and join for each attribute table)
  • Create a separate *history table for each attribute table (could result in a massive amount of table since we have around 70 attributes spread over 20 product types)
  • Additionally: add an indexed 'current' column to each attribute table to speed up the 5NF view

Any help is appreciated!


Edit: I know the concept of temporal databases, yet the problem is that for the database engine i'm working with (postgresql) the temporal extension isn't fully implemented yet. Any advice on how to achieve this without temporal databases?

解决方案

The recently approved SQL:2011 standard incorporates features that allow you to deal better with this kind of problem than you could ever before.

Not that you'll be able to do everything you'd want to do in the temporal arena, but what did get introduced is indeed a fairly significant improvement.

A good presentation about it is at http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf .

Note that there's only a single vendor with reasonable support for these features in his SQL product, one other is perhaps working on it, and a third has opened the voting channel for their customers.

There's also a "Temporal Data" discussion group at www.linkedin.com dedicated to precisely your subject at hand.

EDIT trying to address "Any advice on how to achieve this without temporal databases?"

Do not add just a single date/time type column to your models. The first reason is as you gave, the second reason is that this solution is also the one promoted by the new standard, and that it will facilitate transition to engines that do support the new features once they are available.

So add BOTH a start- and an end- date/time column. DO NOT MAKE EITHER OF THEM NULLABLE. The new standard requires this for its temporal features. If the end-MIT (moment-in-time) is still unknown, use the highest value of the applicable time type, e.g. 9999-12-31.

You do not NEED to "create separate history tables for each attribute". It is equally possible to have a "single entity table" that keeps "the history of an entire entity occurrence". The downside is that it will be difficult to query for when an ACTUAL change occurred to some particular attribute (because you get new historical rows for any change to any attribute, possibly copying over the same attribute value for most of the attributes). The 'single table' is likely to be an eager consumer of space, the 'separate history for each attribute' may be an eager consumer of querying CPU time. It will be a balancing act, and where the balance is precisely, depends on your particular situation.

Do not "add an indexed 'current' column" to your tables. First, they will not help you transitioning to the new features when your engine has them, and second, Y/N columns are very bad discriminators, and therefore very poor candidates for indexing. I'd rather add your start- or end-mit to the index, they can be expected to give you the same wins for the 'current' rows, and a better win for the non-current rows, whenever you need to query those.

As for the enforcement of database constraints such as non-overlap in time periods in temporal keys and inclusion of time periods in temporal RI, well you're just entirely on your own. Write the code you need in triggers or SPROCs or application code, in decreasing order of preference.

Was this more helpful ?

这篇关于6NF和历史属性数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆