什么是在数据仓库中表示时间间隔的最佳实践? [英] What is best practice for representing time intervals in a data warehouse?

查看:145
本文介绍了什么是在数据仓库中表示时间间隔的最佳实践?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

特别是我处理类型2 缓慢变化的维度,需要表示时间一个特定记录被激活,即对于每个记录,我有一个 StartDate EndDate 。我的问题是关于是否使用关闭 [StartDate,EndDate] )或半打开 [StartDate,EndDate) / em>)interval来表示,即是否包括间隔中的最后一个日期。举一个具体的例子,假设记录1从第1天到第5天和从第6天起活动,记录2变为活动。我要使记录1的EndDate等于5或6吗?

In particular I am dealing with a Type 2 Slowly Changing Dimension and need to represent the time interval a particular record was active for, i.e. for each record I have a StartDate and an EndDate. My question is around whether to use a closed ([StartDate,EndDate]) or half open ([StartDate,EndDate)) interval to represent this, i.e. whether to include the last date in the interval or not. To take a concrete example, say record 1 was active from day 1 to day 5 and from day 6 onwards record 2 became active. Do I make the EndDate for record 1 equal to 5 or 6?

最近我来到了思考的方式,说半开间隔最好基于,inter特别是 Dijkstra:为什么编号应从零开始为以及数组切片的约定和Python中的 range()函数。在数据仓库上下文中应用这一点,我将看到半开放间隔约定的优点如下:

Recently I have come around to the way of thinking that says half open intervals are best based on, inter alia, Dijkstra:Why numbering should start at zero as well as the conventions for array slicing and the range() function in Python. Applying this in the data warehousing context I would see the advantages of a half open interval convention as the following:


  • EndDate-StartDate 给出记录的活动时间

  • 验证:下一条记录的 StartDate 将等于

  • 未来校样:如果我以后决定将我的粒度从日常更改为更短,那么切换日期仍然保持精确。如果我使用一个封闭的时间间隔并存储EndDate的时间戳为午夜,那么我必须调整这些记录以适应这种情况。

  • EndDate-StartDate gives the time the record was active
  • Validation: The StartDate of the next record will equal the EndDate of the previous record which is easy to validate.
  • Future Proofing: if I later decide to change my granularity from daily to something shorter then the switchover date still stays precise. If I use a closed interval and store the EndDate with a timestamp of midnight then I would have to adjust these records to accommodate this.

因此,我的偏好是使用半开间隔方法。然而,如果有一些广泛采用的行业惯例使用封闭间隔方法,那么我可能会倾向于更喜欢这样做,特别是如果它是基于实施这样的系统的实践经验而不是我的抽象理论。

Therefore my preference would be to use a half open interval methodology. However if there was some widely adopted industry convention of using the closed interval method then I might be swayed to rather go with that, particularly if it is based on practical experience of implementing such systems rather than my abstract theoretising.

感谢您提供任何见解或意见。

Thanks in advance for any insights or comments.

推荐答案

打开使用中的版本。我更喜欢半开的原因,你已经说明了。

I have seen both closed and half-open versions in use. I prefer half-open for the reasons you have stated.

在我看来,半开放版本,它使预期的行为更清楚,是更安全。谓词(a <= x

In my opinion the half-open version it makes the intended behaviour clearer and is "safer". The predicate ( a <= x < b ) clearly shows that b is intended to be outside the interval. In contrast, if you use closed intervals and specify (x BETWEEN a AND b) in SQL then if someone unwisely uses the enddate of one row as the start of the next, you get the wrong answer.

将最后的结束日期默认为DBMS支持的最大日期,而不是null。

Make the latest end date default to the largest date your DBMS supports rather than null.

这篇关于什么是在数据仓库中表示时间间隔的最佳实践?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆