使用日期维度表直接存储日期的优点是什么? [英] What is the advantage of using a date dimension table over directly storing a date?
问题描述
我发现这些启发性的帖子,但没有什么可以回答我的问题。
嗯,一个优点是,作为一个维度,你可以在其他表中存储日期的许多其他属性 - 是一个假期,是一个工作日,哪个财政季度,特定(或多个)时区的UTC偏移量等等。其中一些您可以在运行时计算,但在很多如果您只是将DATE存储在表中,那么只有一个选项可以指示一个缺少的(或者只有可能的)预先计算。
日期(NULL),或者您需要开始弥补无意义的令牌日期,如1900-01-01,意思是一件事(因为你不知道丢失)和1899-12-31意味着另一个(因为任务仍在运行而丢失,这个人还活着,等等)。如果您使用维度,则可以有多行代表DATE未知/缺失的特定原因,没有任何魔术值。
个人而言,我宁愿只是存储一个DATE,因为它比INT(!)小,它保存各种日期相关属性,执行日期数学等的能力。如果日期丢失的原因很重要,我可以随时添加一个列到表中以指示。但我正在回答别人的数据仓库帽子。
I have a need to store a fairly large history of data. I have been researching the best ways to store such an archive. It seems that a datawarehouse approach is what I need to tackle. It seems highly recommended to use a date dimension table rather than a date itself. Can anyone please explain to me why a separate table would be better? I don't have a need to summarize any of the data, just access it quickly and efficiently for any give day in the past. I'm sure I'm missing something, but I just can't see how storing the dates in a separate table is any better than just storing a date in my archive.
I have found these enlightening posts, but nothing that quite answers my question.
- What should I have in mind when building OLAP solution from scratch?
- Date Table/Dimension Querying and Indexes
- What is the best way to store historical data in SQL Server 2005/2008?
- How to create history fact table?
Well, one advantage is that as a dimension you can store many other attributes of the date in that other table - is it a holiday, is it a weekday, what fiscal quarter is it in, what is the UTC offset for a specific (or multiple) time zone(s), etc. etc. Some of those you could calculate at runtime, but in a lot of cases it's better (or only possible) to pre-calculate.
Another is that if you just store the DATE in the table, you only have one option for indicating a missing date (NULL) or you need to start making up meaningless token dates like 1900-01-01 to mean one thing (missing because you don't know) and 1899-12-31 to mean another (missing because the task is still running, the person is still alive, etc). If you use a dimension, you can have multiple rows that represent specific reasons why the DATE is unknown/missing, without any "magic" values.
Personally, I would prefer to just store a DATE, because it is smaller than an INT (!) and it keeps all kinds of date-related properties, the ability to perform date math etc. If the reason the date is missing is important, I could always add a column to the table to indicate that. But I am answering with someone else's data warehousing hat on.
这篇关于使用日期维度表直接存储日期的优点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!