涉及时间属性的设计数据库 [英] design database relating to time attribute

查看:189
本文介绍了涉及时间属性的设计数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想设计一个数据库,其描述如下:
每个产品在一个时间点只有一个状态。然而,产品的状态在其寿命期间可以改变。我如何设计产品和状态之间的关系,可以容易地查询当前时间的特定状态的所有产品?此外,任何人都可以给我一些深入的细节关于设计数据库,有关的持续时间作为上面的问题?感谢任何帮助

解决方案

这里是一个模型来实现您的要求。



链接到时间序列数据模型



链接到IDEF1X记法 。对于不熟悉关系建模标准的人来说,




  • 标准化为5NF;无重复列;没有更新异常,没有空值。


  • 当产品的状态更改时,只需向ProductStatus插入一行,不需要触摸前面的行(这是真的,并保持真实)。


  • DateTime是产品被置于该状态的实际日期时间; 从,如果你愿意。 To很容易导出:它是Product的下一个DateTime(DateTime>From)行;




第一个模型是完整的; (ProductId,DateTime)足以为主键提供唯一性。但是,由于您对某些查询条件请求速度,因此我们可以在物理级增强模型,并提供:




  • 索引(我们已经有了PK索引,因此我们将在添加第二个索引之前加强它)来支持覆盖的查询(基于{ProductId | DateTime | Status}的任何安排的索引可以由索引提供,而不必转到数据行)。它将Status :: ProductStatus关系从未识别(虚线)更改为识别类型(实线)。


  • PK安排是基于大多数查询将是基于Product⇢DateTime⇢Status的时间序列。


  • 第二个索引用于根据状态提高查询速度。 / p>


  • 在替代安排中,即,我们主要想要所有产品的当前状态。


  • 在ProductStatus的所有版本中,次索引(而不是PK)中的DateTime列为DESCending;




我已经提供了您要求的讨论。当然,你需要尝试一个合理大小的数据集,并做出自己的决定。



对评论的回应



报告当前状态为2的所有产品



  SELECT ProductId,
说明
FROM产品p,
ProductStatus ps
WHERE p.ProductId = ps.ProductId - 加入
AND St​​atusCode = 2 - 请求
AND DateTime =( - 左边的当前状态...
SELECT MAX(DateTime) - 外部产品的当前状态行
FROM ProductStatus ps_inner
WHERE p.ProductId = ps_inner.ProductId




  • ProductId


  • StatusCode 已编入索引,在查询选项中为第3栏

    li>

    由于索引中的 StatusCode 是DESCending,因此只需要一次提取即可满足内部查询


  • 这些行同时是必需的,对于一个查询;它们靠近在一起(由于Clstered索引);




这是普通的SQL,一个子查询,使用SQL引擎的力量,关系集处理。它是一个正确的方法,没有更快,任何其他方法会更慢。任何报表工具只需点击几下即可生成此代码,无需输入。



ProductStatus中的两个日期



诸如DateTimeFrom和DateTimeTo之类的列是严重错误。让我们按照重要性排序。


  1. 这是一个严重的规范化错误。 DateTimeTo很容易从下一行的单个DateTime派生;




    • 精度没有达到:这很容易通过DataType(DATE ,DATETIME,SMALLDATETIME)。是否显示一秒钟,微秒或纳秒,是一个商业决策;

    • 实现DateTo列是100%重复的DateTime的下一行)。这需要两倍的磁盘空间。对于大型表,这将是非常不必要的浪费。


    • 由于它是一个短行,您需要


    • 并且两倍的缓存空间


    • 通过引入一个重复的列,你引入了错误的可能性值现在可以从两种方式导出:从重复的DateTimeTo列或下一行的DateTimeFrom)。


    • 这也是更新异常。当你更新任何DateTimeFrom是更新时,必须获取上一行的DateTimeTo(没有大事,因为它接近)和更新(大事,因为它是一个额外的动词,可以避免)。


    • 更短和编码快捷方式是无关紧要的,SQL是一种繁琐的数据操作语言,但是SQL是我们所拥有的 。任何不能编写子查询的人都不应该编码。任何人复制列以轻松编码难度真的不应该建模数据库。


请注意,如果维持最高顺序规则(规范化),则整组较低阶问题

>

编写简单SQL时遇到困难或遇到痛苦的任何人在执行其工作功能时都会受到损害。通常,开发人员根据思考,关系数据库是面向集合的模型。


  • 对于上面的查询,我们需要当前DateTime;由于ProductStatus是按时间顺序排列的产品状态的集合,因此我们只需要属于产品的 set 的最新或MAX(DateTime)。

    / li>
  • 现在让我们以集合来看看据称困难的东西。对于每个产品处于特定状态的持续时间的报告:DateTimeFrom是可用列,并定义水平截止,子集(我们可以排除较早的行); DateTimeTo是产品状态子集中最早的。




  •   SELECT ProductId,
    Description,
    [DateFrom] = DateTime,
    [DateTo] =(
    SELECT MIN子集
    FROM ProductStatus ps_inner
    WHERE p.ProductId = ps_inner.ProductId - 我们的产品
    和ps_inner.DateTime> ps.DateTime - 定义子集,截断

    FROM产品p,
    ProductStatus ps
    WHERE p.ProductId = ps.ProductId
    AND St​​atusCode = 2 - 请求

    p>


    • 考虑获取下一行是面向行的,不是面向集合的处理。在使用面向集合的数据库时出现瘫痪。让优化者做所有为你思考的。


    • 无法在集合中思考,因此仅限于编写单级查询,不是合理的理由:在数据库中实现大量复制和更新异常;浪费在线资源和磁盘空间;保证一半的性能。



    更容易学习如何编写简单的SQL子查询

    I want to design a database which is described as follows: Each product has only one status at one time point. However, the status of a product can change during its life time. How could I design the relationship between product and status which can easily be queried all product of a specific status at current time? In addition, could anyone please give me some in-depth details about design database which related to time duration as problem above? Thanks for any help

    解决方案

    Here is a model to achieve your stated requirement.

    Link to Time Series Data Model

    Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

    • Normalised to 5NF; no duplicate columns; no Update Anomalies, no Nulls.

    • When the Status of a Product changes, simply insert a row into ProductStatus, with the current DateTime. No need to touch previous rows (which were true, and remain true). No dummy values which report tools (other than your app) have to interpret.

    • The DateTime is the actual DateTime that the Product was placed in that Status; the "From", if you will. The "To" is easily derived: it is the DateTime of the next (DateTime > "From") row for the Product; where it does not exist, the value is the current DateTime (use ISNULL).

    The first model is complete; (ProductId, DateTime) is enough to provide uniqueness, for the Primary Key. However, since you request speed for certain query conditions, we can enhance the model at the physical level, and provide:

    • An Index (we already have the PK Index, so we will enhance that first, before adding a second index) to support covered queries (those based on any arrangement of { ProductId | DateTime | Status } can be supplied by the Index, without having to go to the data rows). Which changes the Status::ProductStatus relation from Non-Identifying (broken line) to Identifying type (solid line).

    • The PK arrangement is chosen on the basis that most queries will be Time Series, based on Product⇢DateTime⇢Status.

    • The second index is supplied to enhance the speed of queries based on Status.

    • In the Alternate Arrangement, that is reversed; ie, we mostly want the current status of all Products.

    • In all renditions of ProductStatus, the DateTime column in the secondary Index (not the PK) is DESCending; the most recent is first up.

    I have provided the discussion you requested. Of course, you need to experiment with a data set of reasonable size, and make your own decisions. If there is anything here that you do not understand, please ask, and I will expand.

    Responses to Comments

    Report all Products with Current State of 2

    SELECT  ProductId,
            Description
        FROM  Product       p,
              ProductStatus ps
        WHERE p.ProductId = ps.ProductId  -- Join
        AND   StatusCode  = 2             -- Request
        AND   DateTime    = (             -- Current Status on the left ...
            SELECT MAX(DateTime)          -- Current Status row for outer Product
                FROM  ProductStatus ps_inner
                WHERE p.ProductId = ps_inner.ProductId
                )

    • ProductId is Indexed, leading col, both sides

    • DateTime in Indexed, 2nd col in Covered Query Option

    • StatusCode is Indexed, 3rd col in Covered Query Option

    • Since StatusCode in the Index is DESCending, only one fetch is required to satisfy the inner query

    • the rows are required at the same time, for the one query; they are close together (due to Clstered Index); almost always on the same page due to the short row size.

    This is ordinary SQL, a subquery, using the power of the SQL engine, Relational set processing. It is the one correct method, there is nothing faster, and any other method would be slower. Any report tool will produce this code with a few clicks, no typing.

    Two Dates in ProductStatus

    Columns such as DateTimeFrom and DateTimeTo are gross errors. Let's take it in order of importance.

    1. It is a gross Normalisation error. "DateTimeTo" is easily derived from the single DateTime of the next row; it is therefore redundant, a duplicate column.

      • The precision does not come into it: that is easily resolved by virtue of the DataType (DATE, DATETIME, SMALLDATETIME). Whether you display one less second, microsecond, or nanosecnd, is a business decision; it has nothing to do with the data that is stored.
    2. Implementing a DateTo column is a 100% duplicate (of DateTime of the next row). This takes twice the disk space. For a large table, that would be significant unnecessary waste.

    3. Given that it is a short row, you will need twice as many logical and physical I/Os to read the table, on every access.

    4. And twice as much cache space (or put another way, only half as many rows would fit into any given cache space).

    5. By introducing a duplicate column, you have introduced the possibility of error (the value can now be derived two ways: from the duplicate DateTimeTo column or the DateTimeFrom of the next row).

    6. This is also an Update Anomaly. When you update any DateTimeFrom is Updated, the DateTimeTo of the previous row has to be fetched (no big deal as it is close) and Updated (big deal as it is an additional verb that can be avoided).

    7. "Shorter" and "coding shortcuts" are irrelevant, SQL is a cumbersome data manipulation language, but SQL is all we have (Just Deal With It). Anyone who cannot code a subquery really should not be coding. Anyone who duplicates a column to ease minor coding "difficulty" really should not be modelling databases.

    Note well, that if the highest order rule (Normalisation) was maintained, the entire set of lower order problems are eliminated.

    Think in Terms of Sets

    • Anyone having "difficulty" or experiencing "pain" when writing simple SQL is crippled in performing their job function. Typically the developer is not thinking in terms of sets and the Relational Database is set-oriented model.

    • For the query above, we need the Current DateTime; since ProductStatus is a set of Product States in chronological order, we simply need the latest, or MAX(DateTime) of the set belonging to the Product.

    • Now let's look at something allegedly "difficult", in terms of sets. For a report of the duration that each Product has been in a particular State: the DateTimeFrom is an available column, and defines the horizontal cut-off, a sub set (we can exclude earlier rows); the DateTimeTo is the earliest of the sub set of Product States.

    SELECT               ProductId,
                         Description,
            [DateFrom] = DateTime,
            [DateTo]   = (
            SELECT MIN(DateTime)                        -- earliest in subset
                FROM  ProductStatus ps_inner
                WHERE p.ProductId = ps_inner.ProductId  -- our Product
                AND   ps_inner.DateTime > ps.DateTime   -- defines subset, cutoff
                )
        FROM  Product       p,
              ProductStatus ps
        WHERE p.ProductId = ps.ProductId 
        AND   StatusCode  = 2             -- Request

    • Thinking in terms of getting the next row is row-oriented, not set-oriented processing. Crippling, when working with a set-oriented database. Let the Optimiser do all that thinking for you. Check your SHOWPLAN, this optimises beautifully.

    • Inability to think in sets, thus being limited to writing only single-level queries, is not a reasonable justification for: implementing massive duplication and Update Anomalies in the database; wasting online resources and disk space; guaranteeing half the performance. Much cheaper to learn how to write simple SQL subqueries to obtain easily derived data.

    这篇关于涉及时间属性的设计数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆