当所有尺寸值都具有100%重要性时,处理多对多尺寸 [英] Handling a Many-to-Many Dimension when all dimensional values have 100% importance

查看:183
本文介绍了当所有尺寸值都具有100%重要性时,处理多对多尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我至少会尽量保持简洁。



我们假设我们正在追踪账户余额。所以我们的事实表将有一些列如...



帐户余额资料表



($)
  • (FK)AccountID

  • (FK)DateID

  • ...

  • 余额

  • ...



  • 显然你有一个维度表日期维度表。所以现在我们可以轻松地过滤帐户或日期(或日期范围等)。



    但这里是踢球者...帐户可以属于组 - 任何数字在给定的日期。组是简单的逻辑抽象,除了报告之外它们没有明确的意义。 0,1或17组中的帐户不会以任何方式影响其余额。例如,AccountID 1可以在组38,76,104和159中。帐户2可以在组1中(其具有未分组的组描述),帐户3可以是17个组(实例) p>

    作为一个额外的好处,我们的用户是完全非技术性的,他们不知道SQL,他们没有关系数据库的经验,并且历史上完成了他们所有的工作一个复杂的Excel解决方案,现在我们正在构建一个可以使用PowerPivot进行切片和过滤的维度模型,尽管这些帐户组正在威胁将其他无情的简单星型模式转换成足够复杂的东西,以使用户能够阻止并返回目前意大利面条解决方案。



    所以让我们看看我们的选项...



    布尔方法
    布尔方法是不可行的,我们有大约57万个不同的帐户,但更重要的是26,000个不同的组,这也是最终用户过滤的恶魔,因为它们是非技术性的,并且依赖使用非常简单的工具来完成这项工作。



    多列方法
    从理论上说,这可以起作用,但是我们确实有一些属于17组的帐户。再次,这些团体真的只是逻辑团体 - 他们没有意义,但是业务需要报告。最终用户从17个不同的列过滤掉组不会在用户接受中过得很好,可能会导致用户拒​​绝使用该解决方案(而且我不怪他们)。



    桥表
    这个计数工作,但是我们确实有26,000个不同的组。我没有发现这是用户友好的。



    由于我不喜欢我的选择,我只能假设有一个比雪花更好的方式...除非雪花是唯一的办法。如果有人可以借口解释他们的理由,那么它将不胜感激。






    更新:为了澄清,我认为每个人都可以相关的例子是想象你可以在简历上列出关键字技能。他们都与同一个人有关,但你可以有任何技能。技能不会影响简历上的任何个别措施 - 即C ++并不比C#更有价值 - 您不能将所有的简历/技能组合放在事实表中,或者你会结束(或多于双倍))。



    我认为我能在这里做的最好的是为组创建一个外伸支架。我不是粉丝,但我认为这是唯一真正的选择。



    所以现在我们有...



    帐户余额资料表




    • (FK)AccountID

    • (FK)DateID

    • ...

    • 余额

    • ...



    帐户维度




    • (PK)AccountID

    • 帐户名称

    • ...

    • (FK)密钥



    帐户组Outrigger




    • (PK)AccountGroupID

    • (PK)AccountID)

    • 帐户组名称


      • 解决方案

        我会说你必须从界面开始。用户希望如何在理想的世界中进行过滤?



        我想我最终会去做一座桥梁或者是无数的事实表,或者这样的事情。也许是事实表上的一个代理关键字,以及许多从这个组成员身份的链接表。



        这绝对是艰难的 - 接口和用例必须做可行,所以我从那里开始。也许有些事情会摆脱他们如何做这个报告 - 像组中的等价类或者分配帐户空间的某种方式。也许有一个层次结构或组织,使其更易于管理,并可以通知更简单的设计。


        I'll at least try to keep this succinct.

        Let's suppose we're tracking the balances of accounts over time. So our fact table will have columns such as...

        Account Balance Fact Table

        • (FK)AccountID
        • (FK)DateID
        • ...
        • Balance
        • ...

        Obviously you have an Account Dimension Table and a Date Dimension Table. So now we can easily filter on Accounts or Dates (or date ranges, etc.).

        But here's the kicker... Accounts can belong to Groups -- any number of Groups at a given Date. Groups are simply logical abstractions, and they have no tangible meaning aside from reporting purposes. An Account being in 0, 1, or 17 groups doesn't affect its Balance in any way. For example, AccountID 1 may be in Groups 38, 76, 104, and 159. Account 2 may be in Group 1 (which has a Group Description of "Ungrouped". Account 3 may be in seventeen groups (real example).

        As an added bonus, our users are completely non-technical. They don't know SQL, they have no experience with relational databases, and have historically done all of their work in a convoluted Excel solution. Right now we're building a dimensional model they can slice and filter with PowerPivot, though these Account Groups are threatening to turn an otherwise ruthlessly simple Star Schema into something complex enough that the users will balk and return to their current spaghetti solution.

        So let's look at our options...

        Boolean Method The Boolean method is not feasible. We have about 570,000 different accounts, but more importantly, 26,000 different groups. This would also be a devil for end-users to filter, since they're non-technical and are relying on very simple tools to get this done.

        Multiple Column Method In theory this could work, however, we do have some accounts that belong to 17 groups. Again, the groups are really just logical groups -- they have no meaning, but they are required by the business for reporting purposes. Having end-users filter out groups from 17 different columns isn't going to go over well in user-acceptance, and would likely result in users refusing to use the solution (and I don't blame them).

        Bridge Table This count work, but we do have 26,000 different groups. I'm not finding this to be user-friendly.

        Since I'm not liking my options, I can only assume there's a better way other than snowflaking... unless snowflaking IS the only way. If someone could lend a hand and explain their rationale it'd be appreciated.


        UPDATE: For clarification, an example I think everyone here can relate to is imagine you can list keyword skills on a resume. They all relate to the same person, but you can have any number of skills. The skills don't effect any of individual measures on a resume -- i.e. 'C++' isn't more valuable than 'C#' -- you can't put all the resume/skill combinations in the fact table or you'd end up double counting (or well more than double ;) ).

        I think the best I'm going to be able to do here is to create an outrigger table for groups. I'm not a fan of it, but I think it's the only real option I have.

        So now we have...

        Account Balance Fact Table

        • (FK)AccountID
        • (FK)DateID
        • ...
        • Balance
        • ...

        Account Dimension

        • (PK)AccountID
        • Account Name
        • ...
        • (FK)Account Group Key

        Account Group Outrigger

        • (PK)AccountGroupID
        • (PK)AccountID)
        • Account Group Name

        解决方案

        I would say you've got to start from the interface. How would users like to do their filtering in an ideal world?

        I think I would end up going for a bridge or factless fact table or something like that. Perhaps a surrogate key on the fact table and a many-many link table from that to group membership.

        It's definitely tough - and the interface and usage cases has to be made workable, so I'd start from there. Perhaps something will shake out of how they do this reporting - like equivalence classes in the groups or some way they partition the account space. Maybe there is a hierarchy or organization to the groups which make it more manageable and may inform a simpler design.

        这篇关于当所有尺寸值都具有100%重要性时,处理多对多尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆