链接维度的不同粒度级别的事实表 [英] Link fact tables at different granularity levels of a dimension

查看:54
本文介绍了链接维度的不同粒度级别的事实表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据仓库设计的新手.我有一个表示地理区域(例如郊区,城市,州)的非规范化维度表.这是一个变化缓慢的维度.

New to data warehouse design. I have a denormalised dimension table representing geographies (e.g. suburb, city, state). This is a slowly changing dimension.

还具有多个事实表,每个事实表处于不同的粒度级别.

Also have multiple fact tables, each at different grain levels.

是否可以对此建模,以便事实表使用代理键,同时保持非规范化维度表?

Is it possible to model this so the fact tables use surrogate keys, whilst maintaining a denormalised dimension table?

推荐答案

如果您实际上拥有相同的尺寸数据,但具有不同的粒度,则可以通过创建聚合"来处理此问题.方面.在您的示例中,复制dim_geo表定义(而不是数据),将dim命名为dim_geo_city之类的名称,然后将所有列以比城市更低的粒度(例如,郊区_id,郊区)删除.如果您在状态级别有事实,那么您将以相同的方式创建dim_geo_state,以此类推,以进行进一步的聚合.

If you have effectively the same dimensional data but at different grains then you handle this by creating "aggregate" dimensions. In your example, copy the dim_geo table definition (not the data), name the dim to something like dim_geo_city and drop all the columns at a lower granularity than city (e.g. suburb_id, suburb). If you have facts at the state level then you would create dim_geo_state in the same way - and so on for any further levels of aggregation.

事实人群将继续引用dim_geo,但事实住房应引用dim_geo_city.

Fact_population will continue to reference dim_geo but fact_housing should reference dim_geo_city.

最简单的填充聚合暗淡的方法是在基本暗淡(dim_geo)上运行SELECT DISTINCT,并且仅包括目标暗淡(dim_geo_city)中存在的列-然后,您获取结果数据并应用适当的SCD逻辑将其插入/更新到目标昏暗状态.

The easiest way to populate aggregate dims is to run a SELECT DISTINCT on the base dim (dim_geo) and only include the columns that exist in the target dim (dim_geo_city) - you then take the resulting data and apply the appropriate SCD logic to insert/update it into the target dim.

这篇关于链接维度的不同粒度级别的事实表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆