数据仓库用户数据 - 设计Q [英] Data warehouse for user data - design Q
问题描述
如何最好地存储用户数据与日期/时间维度? Usecase是我正在尝试每小时存储用户操作。如股数,喜欢,朋友等我有一个时间表和一个日期表。时间很简单 - 我每天的每个小时都有每行= user_id和colunms = 1到24。但问题是日期。如果我给每一天= 1 colunm,那么我将有一年365科隆。我无法归档数据方式,因为分析也需要过去的数据。其他策略是什么?
How to best store user data vs date/time dimension? Usecase is I am trying to store user actions per day, per hour. Such as number of Shares, likes, friends etc. I have a time table and a date table. For time it is easy - i have each row = user_id and colunms = 1 to 24 for each hour of the day. But problem is for dates. If i give each day = 1 colunm then i will have 365 colunms a year. I cannot archive the data way either because analytic needs past data too. What are the other strategies?
推荐答案
dimDate : 1 row per date
dimTime : 1 row per minute
最初你必须说出 grain 事实表,然后坚持。
At the beginning you have to state the "grain" of the fact table and then stick to it.
如果粮食是一天,那么 TimeKey
总是指向23:59的键。
If the grain is one day, then TimeKey
always points to the key of "23:59".
如果粮食是一个小时,那么 TimeKey
指向HH:59的条目。
If the grain is one hour, then TimeKey
points to entries of "HH:59".
如果粮食是一分钟,那么 TimeKey
指向相应的HH:MM
If the grain is one minute, then TimeKey
points to the respective "HH:MM"
如果粮食是15分钟,那么 TimeKey
指向相应的HH :14,HH:29,HH:44,HH:59
If the grain is 15 minutes, then TimeKey
points to the respective "HH:14", "HH:29", "HH:44", "HH:59"
等等...
-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?
select
DayOfWeek
, sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser as u on u.UserKey = f.UserKey
join dbo.dimDate as d on d.DateKey = f.DateKey
join dbo.dimTime as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
and MonthNumberInYear between 1 and 3
and t.Hour between 3 and 5
and d.IsHoliday = 'no'
and UserEmail = 'john_doe@gmail.com'
group by DayOfWeek
order by DayOfWeek ;
这篇关于数据仓库用户数据 - 设计Q的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!