如何为数据仓库中的流程和状态历史建模? [英] How to model process and status history in a data warehouse?

查看:106
本文介绍了如何为数据仓库中的流程和状态历史建模?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有 D_PROCESS D_WORKER D_STATUS 作为维度,以及事实 F_EVENT 将一个流程(什么)与一个工人(负责人)和当前状态联系起来。

Let's say that we have D_PROCESS, D_WORKER and D_STATUS as dimensions, and the fact F_EVENT that links a process (what) with a worker (who's in charge) and the "current" status.

过程状态随时间变化。我们应该在 F_EVENT 中存储每个流程/状态/工人一行,或每个流程/工人一行,在其他地方针对给定流程/状态更改每个状态一行工作人员?

The process status changes over time. Shoud we store in F_EVENT one line per process/status/worker, or one line per process/worker, and "somewhere else" one line per status change for a given process/worker?

我是Datawarehouse的新手,很难找到与数据建模有关的最佳实践/教程。

I'm new to Datawarehouse and it's hard to find best practices/tutorial related to data modelization.

推荐答案

阅读数据

Read The Data Warehouse Toolkit by Ralph Kimball for a good introduction to dimensional modeling.

听起来像您正在将流程更改事件存储在F_EVENT中。如果此过程具有定义的开始和结束,我将构建一个快照事实表,该表将使您可以随时间跟踪该过程(每次过程从一个步骤移至另一步时,只需更新该行)。

It sounds like you are storing a process change event in F_EVENT. If this process has a defined beginning and end, I would build a snapshot fact table which would let you track the process over time (simply updating the row each time the process moves from one step to another).

编辑:

我将尝试以您的尺寸为例进行概括。

I'll try to make a general case using your dimensions as examples.

对于D_PROCESS,通常不将流程建模为维度,而您将其称为什么,因此我将其重命名为 D_ACCOUNT 。

For D_PROCESS, modeling a "process" isn't usually modeled as a dimension, and you called it a "what", so I'm going to rename this to "D_ACCOUNT".

基本数据模型将用于税收处理系统,其中工人正在处理帐户,并且每个帐户/工作人员组合都有几种可能的状态,

The basic data model will be for a "tax processing system" in which WORKERS are processing ACCOUNTS, and each ACCOUNT/WORKER combination has several possible "STATUSES" of where this process currently stands.

D_ACCOUNT
    ACCOUNT_NUMBER
    ACCOUNT_TYPE

D_WORKER
    WORKER_ID
    FIRST_NAME
    LAST_NAME
    BADGE_NUMBER
    SHIFT

D_STATUS
    STATUS_ID
    STATUS_NAME

现在,如果我想报告某个帐户发生的所有事件(由工人执行),我可以建立一个交易-级别事实表F_EVENT:

Now if I want to report on all "events" that have happened to an Account, performed by a worker, I can build a Transaction-level fact table F_EVENT:

F_EVENT
    ACCOUNT_ID
    WORKER_ID
    STATUS_ID
    EVENT_TIME_ID
    Metrics taken at time of the measurement (Cost, Worker time spent, etc)

我们将标识行的唯一维组合称为粒度 G事实表的雨

We call the unique combination of dimensions that identifies a row the Granularity or Grain of the fact table.

此表的内容为帐户,工作人员,状态和时间。它回答的问题是:我的三班制员工在星期三花了多少时间处理帐户?或发生了多少事件,将处理状态更改为已关闭?

The grain of this table is Account, Worker, Status, and Time. It answer questions like "How much time did my workers on shift 3 spend processing accounts on Wednesday?" or "How many events occured that changed the processing status to "CLOSED"?

我不确定这种类型的表有多大帮助。

I'm not sure how much this type of table would help.

相反,假设您有兴趣跟踪流程本身在各种状态下的移动。我将假设状态始终在时间上前进,从未开始到

Instead, say you are interested in tracking the process itself as it moves through various statuses. I'm going to assume that the status always moves forward in time, from "NOT STARTED" to "IN PROCESS" to "CLOSED".

我将构建Kimball所谓的累积快照事实表。

I'll build what Kimball calls an "Accumulating Snapshot Fact table.

F_TAXPROCESSING
    ACCOUNT_ID
    WORKER_ID 
    CURRENT_STATUS_ID
    NOT_STARTED_DTTM
    NOT_STARTED_FLAG
    IN_PROCESS_DTTM
    IN_PROCESS_FLAG
    CLOSED_DTTM
    CLOSED_FLAG

此表的谷物为Account,Worker。该表通过更新对状态的更改的日期/时间来跟踪过程,并在达到该状态时提供标记。

This table's grain is Account, Worker. This table keeps track of the "process" by updating the date/time of the change to the status, and a flag when that status has been reached.

跟踪一段时间内的流程,使您可以查看有多少帐户对处理中状态做出了反应,到达该状态花费了多长时间等。

This allows you to track the process over time, allowing you to see how many accounts have reacted the "IN PROCESS" status, how long it took to get there, et cetera.

这篇关于如何为数据仓库中的流程和状态历史建模?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆