假设多值维具有一对多关系[Dim 1:many Fact],如何在星形模式中表示它? [英] How can a multi-valued dimension be expressed in a star-schema given that it has 1-to-many relationship [Dim 1: many Fact]?

查看:186
本文介绍了假设多值维具有一对多关系[Dim 1:many Fact],如何在星形模式中表示它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是数据仓库实践的新手,在学术研究中,我想使用选定感兴趣区域中的数据集创建星型模式.因此,我和我的同学选择了一个国家一年内发生的车祸的数据集.

I am new to Data Warehouse practices and in the context of an academic exercise I would like to create a star-schema using a dataset in a chosen area of interest. So, my classmate and I chose a dataset of car accidents in a country during a year.

问题:在很多情况下,即使不是最多的问题,也涉及不止一辆汽车.因此,如果我选择将事故"事件作为事实表,并以驾驶员",汽车",伤亡",位置",状况"等作为维度,那么如何将它们转化为星型模式,什么时候尺寸汽车",驾驶员"和伤亡"是多值的?例如,我可以涉及3辆汽车,3位驾驶员和7名伤亡人员.考虑使用星型模式是强制性的.

The problem is that in a lot of cases if not the most, there are more than one cars involved. So if I choose to have incidents of "accidents" as the Fact Table with "Driver", "Car", "Casualties", "Location", "Contitions" etc as Dimentions, how can these be transformed in a star-schema, when dimensions "Car", "Driver" and "Casualties" are multivalued? For example I can have 3 cars involved, 3 drivers and 7 casualties. Consider that the use of star-schema is mandatory.

据我所知,事实表最常在测量中具有数字值.它也可以将catecorical变量作为度量值吗?

Also, as far as I know, a Fact Table can most often have numeric values in measurements. Can it also has catecorical variables as measurements?

推荐答案

尺寸建模中一个非常重要的概念是晶粒的概念. Ralph Kimball(如果您正在学习尺寸建模,您将一遍又一遍地进行这项工作)强调指出,从尽可能低的粒度开始建模非常重要.这样一来,您就可以通过多种方式对数据进行切片和切块,从最低到最高的粒度进行汇总.

A very important concept in dimensional modelling is that of the grain. Ralph Kimball (whose work you will run into again and again if you're learning about dimensional modelling) emphasizes that it's really important to model from the lowest possible grain up. This lets you slice and dice your data in as many ways as possible, summing up from the lowest to any higher granularities.

通常,当您发现所有似乎都是多对多的问题之一时,问题实际上是您为所讨论的事实表选择了错误的粒度.在对Nick.McDermaid(建议在评论中进行粒度更改)表示歉意之后,个人参与事故"的粒度比事故"的粒度低,因此将事实表的粒度降低到至少该级别-并且创建事故维度-很有意义.

Quite often when you find one of these issues where everything seems to be many-to-many, the issue is actually that you've chosen the wrong grain for the fact table in question. With apologies to Nick.McDermaid (who has suggested this granularity change in comments), "participation of an individual in an accident" is a lower granularity than "accident," so lowering the granularity of the fact table to at least that level - and creating an Accident dimension - makes a lot of sense.

不过,这可能不是最低的粒度;例如,如果您的数据集跟踪受伤情况,则每个参与者可能会遭受多次伤害.因此,在这种情况下,事实表可能更适合作为事故中遭受的伤害"的情况-可能需要在伤害"维度中添加一行以表示没有伤害",以防万一包括未受伤的参与者.因此,您应该做的第一件事不是确定事实表是什么,而是筛选数据并尝试找出最低的粒度.完成此操作后,您应该对事实表将要建模的对象以及所需的维度有很好的了解.

It's possible that's not the lowest granularity, though; for instance, if your data set tracks injuries, each participant might have multiple injuries. So the fact table grain might be better off as "injuries sustained during an accident," in that case - you would need a row in your Injury dimension that indicated "no injury," in case, to include those participants who were not injured. So the first thing you should do isn't decide what your fact table is, it's to sift through the data and try to figure out what your lowest granularity is; once you've done that, you should have a good handle on what your fact table will be modelled around, and which dimensions you need.

尺寸建模可能有点棘手,因为您可以通过多种方式来做事情-最正确的方法通常似乎并不十分明显,尤其是当您从背景中移动时,重新习惯于更规范化的数据结构.我建议首先尝试使用最基本的表类型为某些模型建模-即尝试避免诸如雪花,桥接表等之类的东西-看看是否可以提出一种避免这些窍门的解决方案.通常,这会带来更好的模型(即更易于导航,查询性能更好并且可用于回答更多问题的模型).

Dimensional modelling can be a bit of a tough nut to crack because there are multiple ways you can do things - and the most correct way often doesn't seem very obvious, especially if you're moving from a background where you're used to more normalised data structures. I'd suggest first and foremost try to model something using the most basic table types - i.e. try to avoid things like snowflaking, bridge tables, etc. - and see if you can come up with a solution that avoids those tricks. Very often this will lead to a better model (i.e. one which is simpler to navigate, has better query performance, and can be used to answer more questions).

Nick.McDermaid提出的有关尝试各种尝试的建议也很扎实,因为它可以帮助您摆脱最初的假设.有时存在多种潜在设计-可能需要彻底考虑所有可能的设计,以决定哪一个是最佳的.

Nick.McDermaid's advice to experiment and try different things is also solid, as it can help you to break yourself out of your initial assumptions. There are sometimes multiple potential designs - thinking them all through thoroughly can be necessary to decide which is best.

这篇关于假设多值维具有一对多关系[Dim 1:many Fact],如何在星形模式中表示它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆