数据仓库中的Star vs Snowflake模式? [英] Star vs Snowflake schema in data warehousing?

查看:297
本文介绍了数据仓库中的Star vs Snowflake模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我参与了一个基于基于仓库的智能交易分析银行系统,该系统具有客户流失行为,欺诈检测和监控功能. CRM分析.我们一直在使用Oracle作为数据库&它完全是一个数据仓库项目,具有用于分析的数据挖掘算法.

Currently, I've been involved in an warehouse based intelligent transaction analysis banking system featuring customer churn behavior, fraud detection & CRM analysis. We've been using Oracle as the database & it's completely a data warehousing project with data mining algorithms used for analysis.

我们有大约1000家银行客户的记录.对于建模,是否最好使用星型模式或雪花模式或星座模式?我知道星型和雪花模式的基本区别-尺寸表的规范化发生在雪花(也称为雪花)模式中,这对于大型数据库的连接可能会出现问题.

We have records of about 1000 customers of a bank. For modeling, whether it is better to use the star schema or snowflake schema or constellation schema? I know the basic difference of star and snowflake schema- normalization of dimension table occurs in snowflake (a.k.a. snowflaking) schema which may be problematic for joining in case of large-sized database.

那么,哪种模式更适合我的情况?我们非常欢迎经验丰富的从事数据仓库工作的程序员提供答案!

So, which schema would be better for my case? Answers from experienced programmers involved in data warehousing are highly welcomed!

提前谢谢!

推荐答案

简而言之,我认为进入这样的项目的前提是星形模式比较合适.我可能会修改它,如果看起来某个维度太大而无法有效地进行全扫描 ,并且通过雪花化 除非该维度加入了该维度,否则可以显着提高针对该维度的查询的效率分区键上的事实表(由于难以对放置在雪花维度上的谓词应用分区修剪).

In brief, my assumption going into a project like this would be that a star schema would be appropriate. I might modify that if it appeared that a dimension was getting too large to efficiently full scan and the efficiency of queries against it could be meaningfully improved by snowflaking unless that dimension joined to the fact table on a partitioning key (due to difficulties in applying partition pruning on a predicate placed on a snowflaked dimension).

这篇关于数据仓库中的Star vs Snowflake模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆