数值属性的维度表中的空值 [英] Nulls in dimension table for numeric attributes

查看:125
本文介绍了数值属性的维度表中的空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在文本列的情况下,写入NA:Missing是很容易的, 但是对于保留特定值很重要的数字列应该做些什么。注意:我不想使用带状值的解决方案(例如,0-50,50-100,NA:Missing)的文本列。



例如,客户维度可能具有年龄。如何处理失踪的出生年龄?把它留空添加任意数字作为占位符,如1900?



有时,可能很难找到占位符号。例如,如果销售到期日是非负数,但是可以为零,我不想将0作为空值的占位符。我可以使用负值,例如-1,但是会破坏使用和的查询。

解决方案

您不能对外键使用空值,但是您可以和应该在适当的时候使用指标的空值。一个空值将在汇总时给出准确的结果,其中默认值不会。



在维度表中,属性可以也应该为null,因为相同的原因。尽管维度值的聚合并不常见,但它确实会发生,所以当它发生时应该是正确的



如果你有在维度中需要空值,则维度应该有一行用于该目的。日期维度例如可能有3或4个特殊行 - no value unknown 未来是合理的特殊值行,具体取决于您的需要。



您将以BI方式为您节省很多痛苦和苦难。


What is the best way to handle missing values in a dimension table?

In the case of a textual column, it is easy to write "NA: Missing," but what should be done for numeric columns where it is important to retain the specific values. Note: I do not want a solution that uses banded values (e.g., textual columns for "0-50", "50-100", "NA: Missing").

For instance, a customer dimension may have a year-of-birth. How should missing years of birth be handled? Leave it null? Add in an arbitrary number as a placeholder such as 1900?

Sometimes, it may be difficult to find a placeholder number. For instance, if sales-to-date are non-negative, but can be zero I wouldn't want to put "0" as a placeholder for null. I could use negative values such as "-1", but that would ruin queries that use sums.

解决方案

In your fact table you never use a null value for a foreign key, but you can and should use null values for the metrics where appropriate. A null value will give accurate results when aggregated, where a default value will not.

In dimension tables also the attributes can and should be null where appropriate, for the same reason. While it's less common to do aggregation of the dimension values, it does happen, so it should be right when it happens.

If you have a need for an empty value in a dimension then the dimension should have a row for the purpose. The Date Dimension for instance might have 3 or 4 special rows - no value, unknown, past and future are reasonable special value rows, depending on your needs.

You will save yourself a lot of pain and suffering in the BI layer this way.

这篇关于数值属性的维度表中的空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆