为数据库中的行分配访问号? [英] Assigning a visit number to rows in a database?
问题描述
尝试搜索时没有运气,也许我只是没有正确使用术语。无论如何,我正在使用postgresql 8.1数据库(redshift),其中的数据示例如下所示:
Tried searching around with no luck, maybe I'm just not using right the terminology. Anyway, I'm working with a postgresql 8.1 database (redshift) where a sample of the data would look like this:
User ID Timestamp Visit ID
1 05:20:20am 10
2 05:21:00am 11
1 05:22:14am 12
1 05:24:09am 13
假设这是数据的非常基本的样本。我想分析人们在第一次访问,第二次访问,第三次访问等时如何使用其他指标。我认为最简单的方法是创建一个单独的表,如下所示:
Let's say this is a very basic sample of the data. I want to do analysis on how people do on other metrics when its their 1st visit, 2nd visit, 3rd visit, etc. The easiest way I can think to do this would be to create either a seperate table that looks like:
Visit ID Visit Number
10 1
11 1
12 2
13 3
或者以某种方式附加到我的原始表中:
Or somehow appending to my original table:
User ID Timestamp Visit ID Visit Number
1 05:20:20am 10 1
2 05:21:00am 11 1
1 05:22:14am 12 2
1 05:24:09am 13 3
我知道我可以使用groupby并计算访问次数至少为x的访客与访问次数较少的访客的分析表现。但是,我不知道如何在第一次,第二次,第三次访问中分析性能,尤其是当用户可能访问过1-5万次的情况下。
I know I can use groupby and count to anaylze performance for those visitors who came at least x amount of times vs those that came less. However, I don't know how to anaylze performance on the first, second, third visit, especially when users could have visited anywhere from 1-50,000 times.
我想回答的示例问题...对于访问次数至少10次的用户,他们在前5次访问还是在第二次5次访问中表现更好?
So an example question I'd like to answer... for users who visit at least 10 times, do they perform better on their first 5 visits or their 2nd five?
非常
谢谢。
推荐答案
一个(可能很慢)选项是使用相关子查询:
One (potentially slow) option is to use a correlated subquery:
SELECT t.UserID,
t.Timestamp,
t.VisitID,
( SELECT COUNT(*) + 1
FROM T T2
WHERE T2.UserID = T.UserID
AND T2.TimeStamp < T.TimeStamp
) VisitNumber
FROM T;
Example on SQL Fiddle
最简单(尽管可能不可行)的解决方案是升级到Postgresql 8.4或更高版本,并利用ROW_NUMBER:
The easiest (although possibly not viable) solution would be to upgrade to Postgresql 8.4 or later and take advantage of ROW_NUMBER:
SELECT t.UserID,
t.Timestamp,
t.VisitID,
ROW_NUMBER() OVER(PARTITION BY t.UserID ORDER BY t.Timestamp) AS VisitNumber
FROM T;
Example on SQL Fiddle
最后,如果第一个查询太慢,并且升级不是一个选项,下面的文章详细介绍了另一个选择。我不知道该如何执行。
Finally, if the first query is too slow, and upgrading is not an option the following article goes into some detail about a further alternative. I have no idea how this will perform though.
http://www.depesz.com/2007/08/17/rownum-anyone-cumulative-sum-in-one-query/
这篇关于为数据库中的行分配访问号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!